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ABSTRACT 

Fono  Sequentiol  Decoding  is  o  technique  for  communicoting  ot  o  high  information 
rote  ond  with  o  high  reliability  over  a  large  class  of  channels.  However,  equip- 
mentcast  and  variation  in  the  time  required  to  decade  successive  tronsmitted  digits 
limit  its  use.  This  work  is  concerned  with  the  latter  limitation. 

Others  have  shown  thot  the  overage  processing  time  per  decoded  digit  is  small  if 

the  information  rate  of  the  source  is  lessthon  a  rote  R  ,  This  report  studies  the 

camp  ^ 

probobility  distribution  af  the  processing  time  rondom  vorioble  ondopplies  the  re¬ 
sults  ta  the  buffer  overflow  probability,  i.e.,  the  probobility  that  the  decoding 
deloy  forces  incoming  data  ta  fill  and  overflow  a  finite  buffer.  It  is  shown  that 
the  overflow  probability  is  relatively  insensitive  to  the  buffer  storage  capacity  and 
to  the  computational  speed  of  the  decoder,  but  quite  sensitive  to  information  rate. 
In  particular,  halving  the  source  rate  mare  than  squores  the  overflow  probability. 
These  sensitivities  are  found  to  be  basic  Sequential  Decoding  and  arise  because 
the  computation  per  decoded  digit  is  large  during  an  interval  of  high  channel 
noise  and  grows  exponentially  with  the  length  of  such  an  interval. 

A  conjecture  is  presented  concerning  the  exact  behavior  of  the  overflow  probability 
with  information  rate.  This  conjecture  agrees  well  with  the  (limited)  experimentol 
evidence  available. 
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THE  COMPUTATION  PROBLEM  WITH  SEQUENTIAL  DECODING 


CHAPTER  I 
INTRODUCTION 

A.  BACKGROUND  AND  PREVIOUS  WORK 

The  branch  of  statistical  communication  theory  known  as  coding  theory  has  received  much 

i 

attention  since  the  results  of  C.  E.  Shannon  in  1948.  Many  investigations  were  and  are  attracted 
to  coding  theory  because  of  the  potential  for  ultrareliable  communication  suggested  by  Shannon's 
Noisy  Coding  Theorem.  Loosely  stated,  this  theorem  says  that  data  can  be  encoded  for  trans¬ 
mission  over  a  noisy  channel  in  such  a  way  that  the  probability  of  a  decoding  error  is  arbitrarily 
small, provided  that  the  information  rate  of  the  source  is  less  than  a  rate  called  channel  capacity; 
the  converse  to  the  Noisy  Coding  Theorem  essentially  says  that  channel  capacity  is  the  largest 
rate  at  which  the  probability  of  error  can  be  made  arbitrarily  small. 

The  implications  of  the  Coding  Theorem  are  obviously  stimulating.  The  fact  that  codes  ex¬ 
ist  for  noisy  channels  which  achieve  small  error  probabilities  while  operating  at  a  fixed  informa¬ 
tion  rate  is  quite  surprising.  A  priori,  one  would  have  expected  that  reliability  could  be  achieved 
only  by  repeating  the  transmitted  message,  that  is,  that  reliability  is  obtainable  only  at  the  ex¬ 
pense  of  less  information  per  unit  time,  i.e,  a  reduction  in  rate. 

Although  the  Coding  Theorem  indicates  the  potential  for  ultrareliable  communication,  it  has 
been  found  that  this  ultrareliability  costs  either  a  great  deal  in  equipment  or  in  decoding  delay. 
Both  costs  are  exorbitant  if  the  decoder  operates  so  as  to  strictly  minimize  error  probability. 
Practical  considerations  force  one  to  consider  less  than  optimum  codes  and  decoders  {in  a 
probability  of  error  sense).  A  number  of  such  codes  and  decoders  have  been  invented.  Included 

among  these  various  coding  techniques  are  Massey's  Threshold  Decoding,^  Gallager's  Low  Den- 
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sity  Parity  Check  Codes,  Bose-Chaudhuri  Codes  with  the  Peterson  Decoding  Procedure,  Itera- 

5  6)  T  8 

tive  Decoding,  '  and  Sequential  Decoding  "  as  first  presented  by  J.M.  Wozencraft  and  later 

9  10 

modified  by  R.  M.  Fano.  Each  of  these  procedures  and  others  *  not  mentioned  find  application 
depending  upon  the  performance  requirements  which  are  set  and  the  economics  of  the  application. 
Sequential  Decoders  score  reasonably  well  in  both  the  performance  and  economic  categories.  We 
shall  concentrate  on  Sequential  Decoding,  and  in  particular  on  the  Fano  Sequential  Decoding  Algo¬ 
rithm,  in  this  report. 

B.  FORMULATION  OF  PROBLEM 

In  many  ways,  the  Fano  algorithm  is  an  attractive  decoding  procedure.  It  applies  to  a  large 
variety  of  channels  in  contrast  with  the  algebraic  codes  such  as  Bose-Chaudhuri  codes  which  are 
best  adapted  to  symmetric  channels  with  an  equal  number  of  inputs  and  outputs  (which  is  a  power 

4 

of  a  prime  ).  The  Fano  algorithm  is  also  recommended  by  the  fact  that  it  will  operate  with  high 
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reliability  at  a  substantial  fraction  of  channel  capacity.  Thus,  it  is  ideally  suited  for  systems 
handling  high-quality,  high-volume  traffic. 

The  Fano  algorithm,  however,  possesses  at  least  two  disadvantages.  The  first  is  that  the 
necessary  encoding  and  decoding  equipment  is  expensive.  The  second  and  most  damaging  dis¬ 
advantage  of  the  Fano  algorithm  is  that  the  time  required  to  process  the  incoming  data  is  varia¬ 
ble  and  assumes  very  large  values  during  intervals  of  high  channel  noise.  The  variability  of  the 
processing  time  requires  that  incoming  data  be  buffered.  The  fact  that  this  processing  time 
assumes  large  values  implies  that  occasionally  and  eventually  a  finite  buffer  will  fill  and  over¬ 
flow.  After  overflow,  it  is  found  that  the  decoder  often  performs  erroneously.  Such  an  event 
is  catastrophic  unless  moderated  with  periodic  resynchronization,  the  use  of  a  feedback  channel, 
or  some  other  means. 

Not  only  is  overflow  serious,  but  it  occurs  much  more  frequently  than  do  undetected  decod¬ 
ing  errors  (i.e.,  errors  without  overflow).  Thus,  it  is  the  controlling  event  in  the  design  of  the 
decoder.  Although  the  overflow  event  is  serious,  the  decoder  can  be  so  designed  and  the  infor¬ 
mation  rate  be  so  restricted  that  overflows  are  very  infrequent.  It  is,  therefore,  a  problem 
which  can  be  resolved. 

Our  concern  in  this  report  is  to  obtain  some  understanding  of  the  sensitivity  of  the  overflow 

probability  to  the  following:  the  buffer  capacity,  the  machine  speed  and  the  information  (or 

signaling)  rate.  This  is  a  difficult  analytical  problem.  As  a  result,  wc  have  been  forced  to 

analyze  the  machine  performance  and  to  determine  the  various  sensitivities  indirectly.  Our 

approach  to  the  overflow  question  has  been  to  consider  a  random  variable  of  computation  (called 

’’static"  computation)  which  is  related  to  the  computation  performed  by  the  machine  as  it  decodes. 

We  have  shown  that  the  cumulative  probability  distribution  function  Pj^  [C  ^  L]  of  the  random 

variable  of  "static"  computation  C  is  an  algebraic  function  of  the  distribution  parameter  L, 

-  cx 

that  is,  it  behaves  as  L  ,  a  >  0,  for  large  L.  From  this  behavior  and  a  study  of  the  exponent 
a,  we  have  found  through  a  heuristic  argument  that  the  probability  of  buffer  overflow  is  relatively 
insensitive  to  a  change  in  machine  speed  or  to  the  size  of  the  buffer  but  that  it  is  quite  sensitive 
to  information  rate,  being  more  than  squared  by  a  halving  of  rate. 

The  deductions  on  the  sensitivities  of  the  overflow  probability  indicate  that  practical  limits 
on  the  size  and  speed  of  a  decoder  are  set  primarily  by  the  overflow  probability  and  that  the 
machine  performance  is  really  only  sensitive  to  information  rate.  This  sensitivity  is  due  to  the 
fact  that  Pj^  [C  ^  L]  behaves  as  L  ^  for  large  L.  We  shall  find  that  Pj^  ]C  >  L]  behaves  as  L  ^ 
for  large  L  because  for  every  transmitted  codeword  there  exists  an  interval  of  high  channel 
noise  such  that  "static"  computation  is  large  and  growing  exponentially  with  the  length  of  the 
interval  of  high  channel  noise.  The  probability  of  such  a  noisy  interval  decreases  exponentially 
with  the  length  of  the  interval.  It  is  the  balance  between  the  two  exponentials  which  forces  the 
algebraic  nature  of  the  distribution  of  "static"  computation,  Pj^  (C  >  L).  Since  this  same  balance 
is  fundamental  to  the  entire  concept  of  Sequential  Decoding,  it  does  not  appear  that  the  buffer 
overflow  problem  can  be  avoided  unless  some  major  modification  of  the  decoding  procedure  can 
be  devised. 

These  results  and  arguments  are  explained  in  detail  in  the  following  chapters. 

Chapter  II  focuses  on  the  Fano  Sequential  Decoding  Algorithm.  The  algorithm  is  defined, 
motivated  and  discussed.  Many  of  its  properties  are  clearly  outlined.  The  buffer  overflow 
problem  is  discussed  and  the  random  variable  of  "static"  computation  is  defined. 
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Chapter  III  is  prefaced  with  a  discussion  of  the  connection  between  an  exponential  growth 
in  computation  with  the  length  of  an  interval  of  high  channel  noise  and  the  algebraic  nature  of 
the  distribution  of  "static"  computation.  The  main  purpose  of  the  chapter  is  to  underbound  the 
distribution  of  "static"  computation.  A  general  underbound  is  found  which  applies  to  all  codes 
on  the  "completely  connected"  discrete  memoryless  channel  (DMC).  A  lower  bound  is  also 
found  for  the  (small)  subset  of  codes  which  have  fixed  composition,  again  for  the  "completely 
connected"  DMC.  Both  bounds  to  [C  >  L]  are  algebraic  in  L. 

Chapter  IV  concentrates  on  obtaining  an  upper  bound  to  the  distribution  of  "static"  computa- 
tion,  [C  >  L].  Since  there  are  "poor"  codes,  codes  for  which  Pj^  [C  >  L]  is  large  so  that 
large  computation  occurs  with  high  probability,  we  must  establish  that  codes  exist  with  a 
Pr  [C  >  L]  which  decreases  as  an  algebraic  function  in  L.  (It  cannot  decrease  any  faster  be¬ 
cause  of  the  lower  bound  result.)  We  show  that  such  codes  exist  by  averaging  Pr  [C  >  L)  over 
the  ensemble  of  all  tree  codes.  This  average  is  algebraic  in  L  so  that  many  codes  exist  with 
an  algebraic  distribution  function. 

Chapter  V  interprets  the  upper  and  lower  bounds  to  Pr  [C  >  L],  describes  an  experiment 
performed  at  Lincoln  Laboratory  and  compares  the  results  of  this  experiment  to  the  tail  be¬ 
havior  of  Pr  [C  >L],  i.e.,  its  behavior  for  large  L.  The  comparison  leads  to  a  conjecture  on 
the  true  tail  behavior  of  Pr  [C  >  L].  It  is  shown  that  this  conjecture  has  a  very  close  connec¬ 
tion  to  some  fundamental  results  in  information  theory  which  are  expressed  in  the  Coding 
Theorem.  Finally,  a  heuristic  connection  between  the  distribution  of  "static"  computation  and 
the  overflow  probability  is  established  and  the  sensitivity  of  the  overflow  probability  to  machine 
speed,  buffer  size  and  information  rate  is  brought  out.  Some  problems  deserving  further  re¬ 
search  are  also  suggested. 
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CHAPTER  II 

DESCRIPTION  OF  FANO  SEQUENTIAL  DECODING  ALGORITHM 


This  chapter  briefly  discusses  the  encoding  problem  and  introduces  the  Fano  Sequential 
Decoding  Algorithm.  The  dynamics  of  the  algorithm  arc  described  and  a  definition  of  computa¬ 
tion  is  presented.  This  chapter  serves  as  preparation  for  the  following  analytical  chapters. 

A.  TREE  CODES 


Let  us  assume  that  the  output  of  a  source  with  a  b-letter  alphabet  is  encoded  for  transmission 

on  a  discrete  memorylcss  channel  (DMC).  (The  DMC  is  characterized  by  the  set  of  transition 

probabilities  {p(y./x,  )}  where  {x,  } ,  1  K  is  the  channel  input  alphabet  and  {y.},  1  j  J  is 

J  ^  ^  J 

the  channel  output  alphabet.)  Consider  encoding  the  source  by  mapping  a  sequence  of  source 

digits  into  a  sequence  of  channel  digits.  The  channel  digits  are  selected  from  an  array  that 

topologically  resembles  a  tree  and  will  henceforth  be  called  a  tree  (see  Fig.  1). 

For  the  moment,  consider  mapping  a  finite  sequence  =  (/3^,  of  n  digits  drawn 

from  the  source  alphabet  onto  a  finite  channel  sequence  u^  =  (u^,  u^,  .  .  . ,  where  u^  = 

(Uqf,  .  .  . ,  •  •  •  *  is  subsequence  of  i  digits  (or  a  tree  branch)  drawn  from  the  channel 

input  alphabet.  At  the  node  of  the  tree,  directs  a  path  along  the  bottom  branch  il  =  a^. 
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along  the  second  branch  from  the  bottom  if  =  a^,  and  along  the  top  branch  if  =  a^.  (A  path 
is  a  contiguous  sequence  of  branches.)  For  example,  the  channel  input  sequence  u^  =  (112,  010, 
122)  corresponds  to  the  source  sequence  -  (1,  0,2)  in  Fig.  1  when  the  source  and  channel  input 
alphabets  are  both  {O,  1,2}. 

The  extended  source  sequence  /3(=/3^)  sipecifies  an  infinite  path  u(=u^)  through  the  tree.  The 
path  u  will  be  called  the  correct  path.  For  each  node  of  the  correct  path,  say  the  q^^,  q  = 

0,  1,  2,  ... ,  where  the  0^^  node  is  the  origin,  we  define  an  "incorrect  subset.’*  The  incorrect  sub¬ 
set  at  the  q^^  node  consists  of  (1)  the  q^^  node  itself  and  (2)  all  nodes  (of  depth  greater  than  q) 
diverging  from  the  q^^  node,  which  are  not  part  of  the  correct  path.  For  example,  see  Fig.  1 
where  the  incorrect  subset  at  the  2^^  node  of  the  correct  path  is  shown. 

We  shall  find  it  useful  to  classify  nodes  in  each  incorrect  subset.  Consider  the  q^^  such 
subset.  Consider  a  node  "at  penetration  s"  in  this  subset  (such  a  node  is  the  terminus  of  a  path 
of  q  +  s  branches).  There  are  a  number  of  nodes  at  this  penetration  s.  Let  the  node  in  question 
be  m^^  from  the  bottom  of  this  set  of  nodes.  Then,  it  is  uniquely  identified  by  the  triplet  (m,  s,  q). 
This  triplet  indicates  that  the  particular  node  is  m  in  rank  among  nodes  at  penetration  s  in 
the  q^^  incorrect  subset  (see  Fig.  1).  The  q^^  node  of  the  correct  path  (or  the  reference  node) 
is  identified  by  the  triplet  (1,  0,  q).  (By  convention,  this  single  node  is  said  to  be  at  penetration 
zero  in  the  q^^  incorrect  subset.)  Denote  by  M(s)  the  number  of  nodes  at  penetration  s  in  the 
q  incorrect  subset.  Then, 


M(0)  =  1 
M(l)  =  (b  -  1) 
M(2)  =  (b  -  1)  b 


M(s)  =  (b  -  1)  for  s  1  .  (1) 

There  are  IVI{s)  paths  at  penetration  s  in  the  incorrect  subset,  and  each  of  these  paths  con- 
tains  q  +  s  branches. 

Given  that  u  =  (u.,  .  .  . ,  u  )  is  transmitted,  let  v  =  (v,,  v^,  .  .  .  ,  v  )  be  the  received  sequence, 
n  '-1  -n^  fh  ^  -1  -2"  -n  ^ 

where  y^  =  (v^^,  .  .  .  ,  •  •  >  '^qi^  ^  subsequence  of  i  channel  output  digits.  The  prob¬ 

ability  that  v^  is  received  when  u^  is  transmitted  is  computed  from  the  transition  probabilities 
of  the  DMC  as  follows: 


n  PR[Yyuj=  n  n 


q=l 


Rl  V  / 


q= 1  h= 1 


qh^  qh^ 


where  p  =  p  [Yj/xj^]  when 


,  I  V  1  -  y-  3.nd  u  ,  -  . 

k’  qh  *^3  qh  k 

The  data  (or  signaling)  rate  (in  bits  per  channel  use)  is  defined  as 


R  = 


log^b 


(2) 


(3) 


If  the  successive  source  digits  are  equally  likely  and  statistically  independent,  then  R  is  also 
the  source  entropy  (or  information  rate)  per  transmitted  digit.  We  shall  assume  that  successive 
source  digits  meet  these  conditions. 
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B.  CONVOLUTIONAL  CODES 


Although  we  shall  later  assume  for  analytical  convenience  that  data  are  encoded  with  an 
arbitrary  tree  code,  we  present  convolutional  codes  here  to  show  that  tree  codes  may  be  realized 
with  a  minimum  of  equipment. 

Define  a  basic  sequence  g  =  (g  ,  g^,  .  .  . ,  g„,  0,  0,  .  .  . ),  called  the  code  generator,  where 
gr  =  subsequence  of  i  digits,  and  S  is  called  the  code  constraint  length. 

We  also  define  translates  of  g  by 

n 

In  =  (0.  0.  •  •  •  -  2'  ii'  •  •  • '  Ss’  ' 

where  0  indicates  a  subsequence  of  i  zeros.  Assume  that  the  letters  in  the  generator  g  and 
the  letters  of  the  source  alphabet  coincide  and  consist  of  the  set  of  integers  {O,  1,  .  .  . ,  b  —  l}  b 
a  prime.  Then,  the  source  sequence  /3  =  (/3^,  .  .  . )  generates  the  channel  sequence  u  = 

(Ui,  u^,  .  .  . )  by 


u  =  y,  g 


(4) 


Multiplication  and  vector  addition  are  taken  modulo  b.  Following  this  rule  the  tree,  partially 
shown  in  Fig.  1,  may  be  constructed  from  the  code  generator  g  =  (112,  010,  201,  221,  000,  .  .  .  ). 
In  particular,  the  source  sequence  /3  =  (1,  0,  2,  .  .  .  )  generates  the  channel  sequence  u  = 

(112,  010,  122,  .  .  .  ). 


Fig.  2.  Convolutionol  encoder. 


ENCODED  DATA 


This  code  can  be  realized  (see  Fig.  2)  with  a  standard  shift  register  of  S  stages  (the  code 
constraint  length),  multiplier st  and  adders  (modulo  b).  Clearly,  the  size  of  the  convolutional 
encoder  does  not  increase  faster  than  linearly  in  the  code  constraint  length.  Others  have  shown 
that  the  probability  of  a  decoding  error  with  Sequential  Decoding  on  convolutional  codes  decreases 
exponentially  in  the  code  constraint  length  (for  almost  all  codes).  In  a  probability  of  error  sense, 
convolutional  codes  are  near  optimum. 


t  The  circles  in  Fig.  2  indicote  multiplication  by  the  enclosed  numbers. 
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This  example  has  assumed  that  the  source  alphabet  and  channel  alphabet  are  identical. 
Neither  this  restriction  nor  the  restriction  that  the  alphabets  contain  the  same  number  of  ele¬ 
ments  is  needed  (see  Ref.  11).  In  addition,  the  constraint  that  b  be  prime  is  not  essential.  For 
example,  b  may  be  a  power  of  a  prime  and  the  components  of  jS  and  g  may  be  chosen  as  ele- 

4 

ments  of  a  general  Galois  field,  addition  and  multiplication  taken  in  this  field. 

C.  FANO  ALGORITHM 

In  preparation  for  a  discussion  of  the  Fano  search  procedure,  we  introduce  and  motivate 
the '“metric"  with  which  the  procedure  operates. 

1.  Metric 

Assume  that  a  source  generates  a  sequence  of  outputs  (i.  This  sequence  directs  a  path  u 
through  a  tree  code.  The  branches  of  this  path  are  transmitted  over  a  discrete  memoryless 
channel.  A  sequence  of  branches  v  is  received  at  the  channel  output.  The  Fano  decoder  is  a 
device  that  operates  on  this  sequence  and  produces  a  replica  of  the  transmitted  sequence,  unless 
decoding  errors  occur. 

The  Fano  decoder  (or  algorithm)  is  a  rule  for  searching  efficiently  through  the  paths  in  the 
tree  code  in  an  attempt  to  find  a  "best  fit"  with  the  received  sequence  v.  To  determine  a  "best 
fit,"  values  are  assigned  to  nodes  in  the  tree.  The  value  of  a  node  is  said  to  be  the  value  of  the 
metric  between  the  path  terminating  on  this  node  and  the  corresponding  received  sequence.  As 
the  decoder  searches  nodes,  values  of  the  metric  are  compared  to  the  criteria  of  Fig.  3.  The 
criteria  ^  straight  lines  of  zero  slope  separated  by  an  amount  t^. 


Let  us  be  precise  about  the  definition  of  metric.  We  define  a  "branch  metric"  and  associate 


a  value  of  this  branch  metric  with  each  branch  of  the  tree.'  Let  u 


(u  . ,  u 
'  ol  o2* 


,  u  J  be  a  tree 
oi 


branch  and  let  v^  =  (v^^,  ^  corresponding  received  branch.  The  branch  metric 

between  u  and  v  ,  d{u  ,  v  ),  is  defined  as 
— o  -o  —  -o  — o 


h-1 


(5) 


t  This  is  not  a  metric  in  the  mathematical  sense  because  d(u  ,v  )  may  be  negative. 

'  “  “a  “a 
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wheret 


T/  \  A  1  ^ 


(6) 


Here,  p  =  p  [x^/xj^]  when  =  y^.  and  =  x^.  We  let  l(v^^)  be  a  probability -like  func¬ 

tion.  It  may  be  interpreted  as  the  probability  of  channel  output  symbol  when  the  channel  in¬ 
puts  are  assigned  probabilities  {pj^}  >  1  ^  K.  The  function  and  the  probability  assignment 

{Pk}  will  appear  during  the  ’’random  code”  argument  of  Chapter  IV  and  an  interpretation  will  be 

attached  to  f(y.)  and  {p,  }  . 

J  ^ 

The  ’’path  metric,”  d(m,  s,  q),  on  the  path  containing  q  +  s  branches  and  terminated  by  node 
(m,  s,  q),  is  defined  as  the  sum  of  the  branch  metric  on  each  of  the  q  +  s  branches.  The  value 
of  this  path  metric  is  associated  with  node  (m,  s,  q).  When  we  plot  d{m,  s,  q)  for  paths  in  the 
tree,  we  indicate  the  values  of  the  path  metric  with  nodes.  The  nodes  in  this  plot  have  a  one- 
to-one  correspondence  to  nodes  in  the  tree  and  will  be  indexed  with  the  same  triplet  {m,  s,  q). 

This  definition  of  path  metric  is  justified  by  two  facts  —  it  leads  to  a  workable  decoder  and 
this  decoder  can  be  studied  analytically.  The  definition  is  recommended  by  the  fact  that  a  large 
value  of  the  path  metric  indicates  that  the  path  in  question  is  very  probable  a  posteriori  (see 
below)  which  is  equivalent  to  saying  that  with  high  probability  this  path  is  the  transmitted  path. 

Wc  now  show  that  the  value  of  the  metric  is  monotone  increasing  in  the  a  posteriori  probability 
of  a  path. 

Let  u^,  n  =  q  +  s,  represent  the  tree  path  (m,  s,  q)  and  let  v^  be  the  corresponding  received 
sequence.  Then,  the  value  of  the  metric  on  u^  is 

n  £ 

d(m,  s,  q)^  Yj  Y  '"rh*  “ 

r- 1  h= 1 


=  log^ 


f{V^) 


—  n  R 


(7) 


where  v^^  are  the  h^^  digits  on  the  r^^  branch  of  u^,  v^,  respectively,  and 

n  i 

f(%)  inn  uv^^)  (8) 

r= 1  h= 1 

In  obtaining  Eq.  (7),  we  have  used  Eqs.  (4)  and  (5),  together  with  the  definition  of 

Eq.  (2).  Now,  Pp^  conditional  probability  that  v^  is  received  when  u^  is  transmitted, 

is  proportional  to  ^  posteriori  probability  of  u^,  since  (from  Baye’s  Rule) 


R 


[u  /v  ]  =  P„  [v  /u  ] 


R 


(9) 


and  Pp^  ^  priori  probability  of  is  constant  under  variation  of  u^.  (We  have  assumed 

that  successive  source  digits  are  statistically  independent  and  identically  distributed.)  Thus,  wc 
have  established  for  the  given  source  that  the  path  of  n  branches  with  the  largest  value  of  the 
metric  is  that  path  of  n  branches  which  is  a  posteriori  most  probable. 


t  If  output  y  occurs  with  probability  f(y)  then  l(x,y)  Is  the  "mutual  Information"  between  x  and  y. 


9 


We  have  attached  a  value  of  the  branch  metric  to  each  of  the  b  branches  stemming  from  a 
node.  We  observe  by  analogy  with  the  argument  above,  that  of  these  branches,  that  branch  with 
the  largest  value  of  the  branch  metric  is  the  a  posteriori  most  probable  branch  at  that  node. 

Then,  we  order  branches  at  a  node  according  to  their  value  of  the  branch  metric  and  say  that 
they  are  ’’most  probable,”  ’’second  most  probable,”  etc. 

We  consider  next  the  motivation  for  and  definition  of  the  Fano  algorithm. 

2.  Search  Procedure 

Sequential  Decoding  procedures  in  general,  and  the  Fano  algorithm  in  particular,  are  moti¬ 
vated  by  the  following  consideration:  For  a  properly  chosen  code  and  for  signaling  rates  which 
are  suitably  restricted,  the  a  posteriori  probability  of  the  correct  path  and  the  value  of  the  path 
metric  on  it  will  typically  increase  (see  Fig.  3).  On  the  contrary,  any  incorrect  path  branching 
from  the  correct  path  will  typically  decrease  in  path  metric  (see  Fig.  3).  Thus,  a  separation 
will  typically  occur  between  the  correct  path  and  some  incorrect  path.  Using  a  set  of  thresholds, 
a  decoder  can  eliminate  from  consideration  a  large  number  of  improbable,  hence,  incorrect 
paths.  As  long  as  the  channel  ’’noise”  is  not  too  severe,  the  separation  between  the  correct  and 
incorrect  paths  will  become  increasingly  evident.  A  period  of  high  channel  noise,  however,  may 
force  a  large  amount  of  searching  and  even  cause  decoding  errors.  We  shall  consider  these  two 
points  later. 

The  set  of  rules  for  searching  tree  paths  which  we  shall  consider  here  is  known  as  the  Fano 
Sequential  Decoding  Algorithm.  A  logical  flow  chart  of  this  procedure^  is  given  in  Fig.  4.  To 


Fig.  4.  Flow  chart  of  Fano  algorithm. 


fSee  Ref,  8  far  the  flaw  chart  af  the  computer  program  which  realizes  the  chart  af  Fig.  4.  The  bookkeeping  re¬ 
quired  by  D  of  Fig.  4  is  accomplished  with  a  small  number  of  instructions  in  the  computer  program.  This  chart 
is  based  an  a  flaw  chart  suggested  by  Professor  I.M.  Jacobs. 
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describe  the  operation  of  this  algorithm  we  introduce  the  notions  of  forward  mode  and  search 
mode  operations.  The  machine  operates  in  the  forward  mode  when  it  is  searching  for  the  first 
time  a  path  whose  metric  is  nondecreasing.  (We  shall  be  more  precise  about  this  point  later.) 
Roughly  speaking,  the  machine  operates  in  the  search  mode  when  it  is  looking  for  a  path  which 
has  a  continuously  growing  metric. 

Let  us  now  be  specific.  Suppose  that  the  decoder  is  following  a  path  which  is  growing  in 
metric  and  that  this  path  is  being  followed  for  the  first  time  so  that  the  machine  is  operating  in 
the  forward  mode.  Then,  at  each  node  of  this  path  the  decoder  raises  a  threshold,  called  the 
running  threshold  T  in  units  of  t^  until  it  lies  just  below  the  value  of  the  path  metric  at  each 
node.  In  Fig.  4  this  operation  is  performed  by  D.  After  the  threshold  is  tightened  at  a  node,  the 
decoder  looks  forward  along  the  "most  probable"  branch  (that  one  which  has  the  largest  value  of 
the  branch  metric).  If  the  path  metric  on  the  extended  path  remains  above  the  existing  value  of 
the  running  threshold  T,  and  if  the  extended  path  is  examined  for  the  first  time,  forward  mode 
operation  continues.  If  the  extended  path  falls  below  T,  as  in  Fig.  5,  search  mode  operation 
begins.  Operation  B  of  Fig.  4  is  then  performed. 


I 3-6?-3??9| 


RUNNING  THRESHOLD 

ENTER  AND  LEAVE 
SEARCH  MODE 


Fig.  5.  Typical  machine  search. 


When  the  machine  enters  B  it  is  looking  for  a  path  which  will  remain  above  T.  Hence,  it 
looks  back  to  the  preceding  node  to  determine  whether  it  remains  above  T.  If  so,  (OK)  perhaps 
the  "next  most  probable"  branch  extending  forward  from  the  original  node  will  remain  above  T. 

At  E,  the  machine  determines  whether  a  "next  most  probable"  node  exists,  and  if  not,  it  looks 
back  again  with  the  same  intention,  that  is,  of  finding  an  extendable  path.  If  in  looking  forward 
in  C  the  machine  finds  that  the  extended  path  remains  above  T,  it  steps  forward  tightening  the 
running  threshold  if  this  node  is  visited  for  the  first  time.  (This  threshold  is  tightened  and  the 
machine  enters  or  remains  in  the  forward  mode  only  when  a  node  is  examined  for  the  first  time. 
Otherwise,  looping  would  occur.)  If  the  forward  look  in  C  is  successful,  the  machine  steps 
forward  and  continues  to  look  forward,  as  indicated  by  Fig.  5.  If  the  forward  look  in  C  is  un¬ 
successful.  the  machine  again  looks  back  in  search  of  a  node  from  which  an  extendable  path  may 
be  found  (i.e.,  a  sequence  of  nodes  which  remains  above  T).  If  an  extendable  path  cannot  be 
found,  that  is,  if  every  sequence  remaining  above  T  and  connected  to  the  node  at  which  searching 
begins  eventually  crosses  T,  then  the  running  threshold  T  must  be  reduced.  After  the  threshold 
is  reduced,  the  decoder  looks  forward  along  "most  probable"  branches  until  it  reaches  the  node 
at  which  it  entered  the  search  mode.  The  branch  on  which  the  decoder  looks  forward  is  a  new 
branch,  so  that  the  threshold  may  be  increased  if  this  extended  path  lies  above  T  (see  Fig.  6). 
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Fig.  6.  Threshold  reduction,  b  =  2, 
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Fig.  7.  Branch  examination  with  a  threshold. 


Fig.  8.  Minimum  threshold  T^. 
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The  machine  operation  may  be  summarized  as  follows:  The  decoder  operates  in  the  forward 
mode,  extending  along  "most  probable"  branches  and  increasing  the  running  threshold  as  it  pro¬ 
gresses,  until  an  extension  fails  the  running  threshold  T.  At  this  point,  search  mode  operation 
begins  and  the  decoder  looks  for  a  sequence  of  nodes  which  remains  above  T.  If  each  sequence 
of  nodes  connected  to  the  node  at  which  search  mode  operation  began  is  such  that  it  crosses  T 
before  forward  mode  operation  resumes,  then  T  is  reduced.  As  soon  as  the  decoder  finds  a 
new  path  remaining  above  the  existing  value  of  T,  forward  mode  operation  begins  and  T  may 
be  increased. 

D.  COMPUTATION 

We  now  establish  that  the  decoder  does  not  look  forward  or  back  on  any  given  branch  more 
than  once  with  each  value  of  the  running  threshold.  There  are  three  situations  which  need  to 
be  considered.  There  is  a  node  at  each  end  of  the  given  branch.  We  need  to  consider  the  ease 
where  both  nodes  lie  above  a  given  threshold,  and  where  either  the  preceding  or  following  node 
lies  below  the  given  threshold.  If  both  nodes  fall  below  some  threshold,  the  branch  considered 
will  not  be  examined  with  this  threshold. 

If  the  node  preceding  the  branch  in  question  lies  above  the  given  threshold,  while  the  follow¬ 
ing  node  lies  below  this  threshold  (see  a  of  Fig.  7),  then  the  decoder  may  look  forward  on  this 
branch,  but  it  cannot  look  back  because  it  would  have  to  step  forward  to  do  so.  But  from  A  of 
Fig.  4,  it  cannot  step  forward  while  this  threshold  is  in  effect.  Next  consider  the  situation  of 
b  in  Fig.  7.  The  decoder  can  look  back  on  the  given  branch,  but  it  cannot  look  forward  because 
it  would  have  to  step  back  to  do  so,  which  is  prevented  by  the  restriction  OK  in  B  of  F  ig.  4.  The 
third  situation  to  be  considered  is  that  of  c  in  Fig.  7.  Both  nodes  terminating  the  branch  in  ques¬ 
tion  lie  above  the  given  threshold.  With  this  threshold  the  decoder  may  look  forward  and  then 
step  forward  (A  of  Fig.  4)  from  the  preceding  to  the  following  node.  The  decoder  may  then  search 
forward  and  later  return  to  the  second  node  with  this  same  threshold.  We  now  show  that  the  de¬ 
coder  cannot  return  to  the  first  node  and  then  retrace  this  branch.  We  observe  from  B,  E,  and 
C  of  Fig.  4  that  this  branch  with  the  given  threshold  cannot  be  retraced  because  the  decoder  can 
extend  only  along  either  the  "next  most  probable"  branch  at  the  first  node,  or  along  the  "next 
most  probable”  branch  at  an  earlier  node.  The  decoder  can  only  retrace  the  original  branch  by 
exiting  from  B  on  BAD  (Fig,  4)  and  lowering  the  threshold.  Thus,  with  any  given  threshold  any 
particular  branch  cannot  be  examined  in  the  forward  and  reverse  directions  more  than  once. 

Now  let  us  consider  the  lowest  threshold  which  is  used  by  the  decoder.  Consider  paths 
branching  from  the  q^^  node  of  the  correct  path  and  terminating  on  nodes  labeled  (m,  s,  q), 

1  m  M(s),  0  ^  s  <  oo  .  Let  D  be  the  correct  path  minimum  at  or  following  the  q^^  node  and 
let  Tj^  be  the  threshold  just  below  (see  Fig.  8).  Assume  that  the  received  path  is  decoded 
correctly,  that  is,  that  decoding  errors  are  not  made.  Then  paths  which  cross  Tj^  will  not 
be  examined  beyond  the  point  at  which  they  cross  Tj^.  This  is  true  since  threshold  Tj^  —  t^  is 
used  only  if  all  paths  fall  below  Tj^;  but  by  definition  the  correct  path  remains  above  Tj^.  This 
implies  that  the  decoder  will  not  step  forward  to  a  node  which  lies  below  Tj^  nor  to  any  node 
connected  to  and  following  such  a  node  (see  (m,  s,  q)  of  Fig.  8). 


t  Since  the  decoder  operation  depends  only  on  incremental  values  of  the  metric,  we  may  assume  that  the  q  car- 

rect  node  lies  between  T  and  T, ,  ond  measure  D  ond  T,..  from  T  =0. 

a  1 '  Do 
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We  may  also  deduce  that  if  D  <  0  and  all  nodes  connecting  any  node  such  as  (m',  s',  q)  in 
Fig.  8  to  (1,  0,  q)  [including  (m',  s',  q)]  be  above  Tj^  +  t^,  then  the  decoder  must  look  forward 
from  (m',  s',  q)  before  the  threshold  is  reduced  to  T^.  (The  constraint  D  <  0  is  necessary  be¬ 
cause  if  D  >  0  the  machine  may  never  be  forced  back  to  (1,  0,  q)  so  that  forward  or  backward 
looks  from  (m,  s,  q)  may  never  occur.) 

The  two  central  results  of  the  last  three  paragraphs  may  be  summarized  as  follows: 

(1)  Consider  a  node  (m,  s,  q)  branching  from  the  q^^  node  of  the  correct 

path.  Let  D  be  the  correct  path  minimum  on  or  following  the  q 
node.  Let  Tj^  be  the  threshold  just  below  D.  Assume  that  node 
(m,  s,  q)  lies  between  thresholds  T  ,  .  and  T  where  T  ^  as  in 
Fig.  9.  Let  be  the  number  of  forward  or  backward  looks  from 
this  node  with  threshold  T^^.  Then,  for  each  threshold  T^  ^  Tj^  and 
Tk<T„,  we  have 

0  Nk  b  +  1 

is  zero  for  any  other  threshold.  The  lower  limit  represents  a 
situation  of  the  type  represented  by  (m,  s,  q)  in  Fig.  8;  in  this  case, 
the  machine  does  not  look  forward  or  backward  from  the  node  in 
question. 

The  conditions  under  which  =  0  and  the  bounds  on  in  Eq.  (10) 
are  central  to  the  arguments  of  Chapter  IV,  which  is  concerned  with 
overbounding  the  statistics  of  the  decoder  behavior. 

(2)  Consider  a  node  such  as  (m',  s',  q)  of  Fig.  8.  This  node  remains 
above  Tj^  +  t^  and  is  connected  to  (1,  0,  q)  through  a  set  of  nodes 
all  of  which  lie  above  T.^  +  t  .  If  D  <  0,  the  decoder  must  look 
forward  at  least  once  from  this  node  before  the  threshold  T  is  re¬ 
duced  to  Tj^  (to  which  it  must  be  reduced,  since  the  decoded  path 
is  ihe  correct  path  and  this  path  lies  below  Tj^  +  t^  at  some  point). 
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The  conditions  under  which  the  decoder  must  look  forward  at  least  once  from  node  (m’,  s’,  q) 
are  central  to  the  arguments  of  Chapter  III,  which  is  concerned  with  underbounding  the  statistics 
of  the  behavior  of  the  decoder. 

We  shall  call  the  number  of  forward  and  backward  looks  at  a  node  the  "computation"  at  this 
node.  These  looks  are  the  operations  which  require  machine  time.  In  the  remainder  of  this 
report,  we  use  this  definition  of  computation  to  investigate  the  computational  demands  of  the 
decoder. 


E.  BUFFER  AND  DYNAMICS  OF  DECODER 


In  the  previous  section,  we  assumed  implicitly  that  the  decoder  is  capable  of  searching  back 

indefinitely  into  the  tree  in  the  process  of  decoding.  Although  this  assumption  will  be  needed  for 

later  analysis,  it  is  not  consistent  with  a  physical  machine.  To  search  back  indefinitely  requires 

that  all  received  branches  be  stored  in  the  decoder.  Practical  limitations  on  the  cost  and  size 

of  the  decoder  force  one  to  consider  buffers  for  storage  which  are  of  finite  size.  We  shall  con- 
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sider  now  a  particular  buffer  realization  and  discuss  the  dynamics  of  the  decoder  operation. 
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Fig.  10.  Buffer. 

Assume  that  the  decoder  operates  with  the  buffer  of  Fig.  10.  Received  branches  are  inserted 
at  the  left  end  of  the  buffer  and  progress  through  the  buffer  at  the  rate  at  whifch  they  arrive.  The 
buffer  stores  B  branches.  Below  each  branch  there  is  space  to  register  an  element  of  the  source 
alphabet.  As  the  decoder  operates,  it  inserts  into  these  places  tentative  source  digit  decisions. 
Insertions  are  made  at  the  position  of  the  "search"  pointer.  When  these  tentative  decisions  reach 
the  left-hand  side  of  the  safety  zone  they  are  considered  to  be  final.  When  they  reach  the  right- 
hand  side  of  the  safety  zone  they  are  considered  to  have  been  decoded.  If  a  digit  released  from 
the  right  end  of  the  safety  zone  disagrees  with  the  corresponding  source  output  digit,  a  decoding 
error  is  said  to  have  occurred. 

The  "search"  pointer  indicates  the  received  branch  at  which  the  decoder  is  looking.  The 
"extreme"  pointer  indicates  the  most  recently  received  branch  that  has  been  examined.  As  the 
machine  operates  the  two  pointers  may  advance  together  toward  the  left-hand  side  of  the  buffer 
until  a  search  is  necessary.  At  that  time  the  search  pointer  and  the  extreme  pointer  will  drift 
back,  the  search  pointer  moving  away  from  the  extreme  pointer.  (When  the  extreme  pointer  is 
not  moving  forward,  it  drifts  back  because  branches  are  arriving  at  a  constant  rate.)  As  the 
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search  pointer  moves  back  it  erases  previous  tentative  decisions,  and  in  moving  forward  it  in¬ 
troduces  new  tentative  decisions.  Digits  in  the  safety  zone  cannot  be  changed. 
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It  has  been  found  from  simulation  that  under  normal  operating  conditions  the  two  pointers 
usually  hover  near  the  left-hand  side  of  the  buffer.  Occasionally,  however,  they  will  drift  back 
a  substantial  distance.  During  this  drift,  the  two  pointers  usually  are  separated  by  a  small  frac¬ 
tion  of  the  distance  they  have  drifted  from  the  buffer  end.  (This  behavior  is  rationalized  by  the 
observation  that  the  number  of  machine  computations  tends  to  grow  exponentially  with  the  depth 
of  the  search  from  the  extreme  point.) 

Occasionally,  the  search  pointer  reaches  the  far  end  of  the  buffer.  Then,  the  decoder  is 
likely  to  release  an  incorrect  digit  into  the  safety  zone;  thereafter,  the  decoder  tries  to  advance 
on  an  incorrect  tree  path.  Since  this  is  difficult,  the  machine  must  do  a  large  amount  of  computa¬ 
tion.  The  search  pointer  then  hovers  near  the  far  end  and  additional  erroneous  source  digits 
are  released  into  the  safety  zone.  Thus,  if  the  search  pointer  is  forced  to  the  far  end  of  the 
buffer,  it  will  tend  to  remain  at  this  end  and  to  decode  in  error.  We  call  this  event  buffer  over¬ 
flow.  This  report  is  motivated  by  a  concern  for  this  event. 

Although,  decoding  errors  may  occur  without  causing  a  large  machine  computation,  it  is 
13 

noted  from  simulation  (and  may  be  rationalized  heuristically)  that  for  safety  zones  of  moderate 
size,  decoding  errors  are  almost  always  preceded  by  overflow.  The  heuristic  argument  states, 
in  effect,  that  the  noise  sequences,  which  are  responsible  for  errors  in  the  absence  of  overflow, 
occur  with  vanishingly  small  probability,  especially  for  safety  zones  of  large  capacity. 

Since  buffer  overflow  can  be  detected,  the  decoder  ean  discard  the  unreliable  digits  in  the 
safety  zone.  Thus,  the  probability  that  an  erroneous  digit  is  released  to  the  user  before  the 
buffer  overflows  can  be  made  very  small,  much  smaller  than  the  probability  of  overflow.  This 
observation  is  equivalent  to  the  statement  that  the  probability  of  a  machine  failure,  where  fail¬ 
ure  means  overflow  or  error  is  dominated  by  the  probability  of  buffer  overflow.  Represent  this 
probability  by  Pgp(N).  We  define  as  the  probability  that  the  first  buffer  overflow  occurs 

on  or  before  the  time  at  which  the  source  decision  enters  the  safety  zone. 

We  shall  be  concerned  in  this  report  with  the  sensitivity  of  Rgp  to  buffer  size  B,  to  the 
speed  of  the  decoder  and  to  the  data  rate  R.  We  shall  find  that  Pgp  is  relatively  insensitive  to 
buffer  size  and  machine  speed,  but  quite  sensitive  to  data  rate.  We  shall  establish  the  meciianism 
which  is  responsible  for  the  particular  sensitivities  of  ^’pp*  Throughout,  we  assume  that  the 
decoder  is  working  with  a  fixed  channel. 

A  preliminary  statement  can  be  made  here  concerning  the  largest  signaling  rate  R  at  which 

7  8  13  14 

Pbf  is  ’'small"  or  at  which  the  decoder  will  function  well.  Others  '  '  '  have  shown,  through 

analysis  and  simulation,  that  the  largest  rate  at  which  the  average  computation  per  decoded  digit 
is  small  is  a  rate  called  Since  large  average  computation  implies  frequent  buffer  over¬ 
flows,  R  is  an  upper  limit  on  the  rate  at  which  the  machine  will  function  properly.  R 

is  strictly  less  than  channel  capacity,  except  for  pathological  channels,  and  is  a  large  fraction 
of  channel  capacity  for  many  but  not  all  channels. 

F.  "STATIC”  COMPUTATION 

Unfortunately,  the  statistics  of  the  dynamical  computation  performed  by  the  Fano  decoder 
as  it  operates  in  time  are  too  difficult  to  study  directly  through  analysis.  Consequently,  we  are 
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led  to  consider  a  kind  of  computation  called  "static”  computation  which  is  at  once  analytically 
tractable  and  closely  connected  to  the  real  machine  computation.  Through  an  investigation  of 
"static"  computation,  we  shall  be  able  to  make  strong  qualitative  statements  about  the  sensi¬ 
tivities  of  Pgp* 

A  restriction  to  the  study  of  "static"  computation  has  been  found  necessary  without  exception 

14-16 

by  all  others  who  have  investigated  the  Fano  algorithm.  By  "static"  computation  we  mean 

a  computation  which  is  eventually  performed  by  the  decoder,  if  no  digits  are  decoded  in  error 
and  if  the  buffer  is  infinite.  Thus,  the  assumptions  are  that  the  decoder  has  a  buffer  of  infinite 
capacity,  that  it  has  operated  for  an  indefinite  length  of  time,  and  that  it  has  decoded  correctly. 

Let  {m,  s,  q)  be  a  node  of  the  q  incorrect  subset  where  1  m  M(s),  0  s  <  <»,  and  M{s) 
is  given  by 


M(0)  =  1 

M{s)  =  (b  -  1)  ,  fors^l  [Eq.  (1)1 

Wc  define  "statie”  eomputation  assoeiated  with'the  eorreet  node  as  the  number  of  eomputa- 
tions  made  on  each  node  (m,  s,  q)  of  the  q  incorrect  subset. 

The  connection  between  "static"  computation  and  the  probability  of  overflow  will  be  made 
later. 
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CHAPTER  III 

LOWER  BOUND  TO  DISTRIBUTION  OF  COMPUTATION 


In  this  chapter,  we  underbound  the  cumulative  probability  distribution  of  the  random  variable 
of  ’’static”  computation  C,  namely,  [C  >  L].  This  underbound  applies  to  discrete,  memory¬ 
less  channels  (DMC)  which  are  completely  connected  (all  channel  transition  probabilities  are 
strictly  positive).  We  show  that  this  lower  bound  is  an  algebraic  function  of  the  distribution 
parameter  L  for  large  L;  that  is,  [C  :^L]  :^(A/L^)  for  all  L  greater  than  some  constant 

L  ,  where  A.  o:  >  0. 

o' 

The  lower  bound  derivation  is  preceded  by  a  discussion  of  the  condition  on  the  random  vari¬ 
able  of  ’’static”  computation  which  is  responsible  for  its  having  an  algebraic  distribution  function. 
Roughly  speaking,  this  condition  states  that  the  distribution  is  algebraic  if  "static"  computation 
is  large  during  an  interval  of  high  channel  noise  and  grows  exponentially  with  the  length  of  such 
an  interval.  This  important  result  is  responsible  for  the  particular  sensitivities  of  the  overflow 
probability  mentioned  in  Chapter  II. 

A.  BEHAVIOR  OF  DISTRIBUTION  OF  COMPUTATION 

The  computation  performed  by  the  Fano  decoder  is  a  random  variable.  It  is  large  during 
periods  of  high  channel  noise  and  small  otherwise.  The  same  is  true  of  the  random  variable  of 
"static"  computation  C  associated  with  the  node  of  the  correct  path.  We  now  argue  some¬ 
what  loosely  that  exponential  growth  of  "static"  computation  implies  that  it  has  an  algebraic  dis¬ 
tribution  function. 

Let  ^  g  be  the  sequence  of  si  channel  transitions  (corresponding  to  s  tree  branches)  following 
the  q^^  correct  node.  The  sequence  alone  is  not  sufficient,  as  a  rule,  to  determine  C  com¬ 
pletely.  Knowledge  of  ^  ^  is  sufficient,  however,  to  determine  whether  C  is  large  or  not. 

If  for  large  s  represents  a  long  interval  of  high  channel  noise,  then  C  will  still  be  random, 
but  all  values  in  its  range  of  values  will  be  very  large.  In  particular,  let  us  assume  that  for 

each  s  >s  there  exists  a  f  such  that  C  >A  2  where  A  ,  0  >  0,  that  is,  the  "static"  compu- 
o  s  o  o  ^ 

tation  grows  exponentially  with  the  length  of  an  interval  of  high  channel  noise.  (Following  argu¬ 
ments  similar  to  those  of  the  next  section,  it  may  be  verified  that  such  an  assumption  holds  for 
all  codes  on  the  completely  connected  DMC.) 

Pj^  [C  ^  L]  [C  Pj^  [^g]  (10) 

where  is  the  probability  that  the  particular  sequence  of  sf  channel  transitions  is  the 

sequence  of  sf  transitions  following  the  q^^  reference  node.  Both  and  s  in  Eq.  (10)  are  arbi¬ 
trary.  For  each  s  let  ^  ^  be  a  high  channel  noise  sequence.  Now  choose  s  such  that 

A  2®®  ^  L  >  A  .  (11) 

o  o 

Then,  for  this  s  and  the  high  channel  noise  sequence  we  have  by  assumption  that  C  ^  . 

Therefore,  from  Eq.  (11),  C  >L  which  implies  that  P^  [C  =  1.  Thus,  for  the  particular 

value  of  s  defined  by  Eq.  (11)  and  for  the  high  channel  noise  sequence  4  g  of  that  length,  we  have 

Pj^(C  >L]  >Pj^  J  .  (12) 
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For  the  completely  connected  DMC  (the  only  channels  considered  in  this  chapter)  wc  have 

Pr[4s1  (13) 

where  (p  ^  —I  log^  P  [y  /yj^l  because  Pp.  ]  is  the  product  of  si  channel  transition  proba- 

bilities  all  of  which  exceed  the  smallest  transition  probability,  the  latter  being  nonzero  by  the 
connectedness  assumption.  Combining  Eqs.(ll)  and  (13)  we  have  the  following  lower  bound  to 

SqG 

Pr  [C  >  L].  The  bound  applies  only  for  s  >  s^  or  L  >  A^2 

/A  \(p/0  s  O 

P^[C>L]>\jfj  Z'^  for  L>L^4a^2°  .  (14) 

Exponential  growth  of  computation  with  the  length  of  an  interval  of  high  channel  noise  implies 
that  the  distribution  of  "static”  computation  is  algebraic,  which  in  turn  implies  the  particular 
sensitivities  of  the  overflow  probability  discussed  in  Chapter  II.  The  existence  of  exponential 
growth  is,  therefore,  a  most  important  characteristic  (or  defect)  of  a  decoding  scheme. 

B.  LOWER  BOUND  ARGUMENT 

Our  intention  in  this  section  is  to  underbound,  without  a  loss  of  rigor,  the  probability 
Pr  ^  L] .  To  underbound  Pr  [C  ^  L],  we  find  an  event  which  implies  that  C  ^  L.  The  proba¬ 
bility  of  the  former  event  underbounds  the  probability  that  C  ^  L  and  is  used  as  the  underbound 
to  Pr  [C  >  L].  As  a  preliminary,  we  recall  some  of  the  definitions  and  statements  of  Chapter  II. 

"Static"  computation  associated  with  the  incorrect  subset  is  defined  as  the  number  of 
forward  or  backward  "looks"  required  by  the  f^ano  decoder  on  the  reference  node  (the  q^^  correct 
node)  or  on  nodes  in  the  q  incorrect  subset.  "Static"  computation  is  measured  under  the  as¬ 
sumption  that  the  decoder  decodes  without  error,  that  the  q  correct  node  is  in  the  infinite  past 
of  the  decoding  process,  and  that  the  buffer  has  infinite  storage  capacity.  The  latter  assumption 
is  equivalent  to  the  assumption  that  the  machine  can  search  forward  or  backward  to  any  length 
in  the  tree. 

A  node  in  the  q  incorrect  subset  is  labeled  (m,  s,  q)  to  indicate  that  it  is  at  penetration  s 
in  this  subset  (there  are  s  branches  between  it  and  the  reference  node)  and  it  is  m^^^  in  order 
among  the  M(s)  nodes  at  that  penetration  in  the  q  incorrect  subset;  M(s)  is  given  below. 

M(0)  =  1 

M(s)  =  (b  —  1)  b^”^  for  s  ^  1  .  [Eq.  (1)] 

There  are  b^  nodes  at  penetration  t  or  less,  since 
t 

^  M(s)  =  1  +  (b  -  1)  +  (b  -  1)  b  +  .  .  .  +  (b  -  1) 
s=0 

=  1  +  (b  -  1)  (1  +  b  +  b^  +  .  .  .  +  b^'S 

=  1  +  (b  - 1)  ^  •  (15) 

The  reference  node  is  labeled  (1,  0,  q)  and  is  said  to  be  at  penetration  zero  in  the  q^^  incorrect 
subset. 
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A  path  metric  is  defined  and  the  value  of  the  path  metric  on  a  path  terminated  by  node 
(m,  s,  q)  is  associated  with  node  (m,  s,  q)  and  is  called  d(m,  s,  q).  Let  represent  the  path  of 
n  =  q  +  s  branches  terminated  by  (m,  s,  q)  and  let  be  the  corresponding  portion  of  the  received 
sequence.^  Then,  d(m,  s,  q)  is  defined  as 

n  f 

d(m,  s,  q)  ^  Yj  Y  [Eq.  (7)] 

r=l  h=l 


whebe 


rh 


rh*  rh' 


[Eq.  (6)] 


and  v^^  arc  the  h^^  of  f  digits  on  the  r^^  branches  of  respectively,  p  ^ 

channel  transition  probability  and  f(v^^)  is  a  probability-like  function  which  is  interpreted  as 
the  probability  of  the  channel  output  digit  v^^  when  channel  inputs  are  assigned  probabilities 
{p^}.  1  <  k<  K. 

As  the  Fano  decoder  operates,  it  attempts  to  extend  along  a  path  which  increases  in  path 
metric.  A  set  oFthrcshold  =  i  t^,  — «  <  i  <  «»,  is  used  to  ascertain  whether  a  path  being  ex¬ 
amined  grows  or  decreases  in  metric.  The  decoder  operation  depends  on  increments  in  the 
path  metric.  Thus,  we  may  assume  that  the  reference  node  (1,  0,  q)  lies  between  T^  =  0  and 
T.  =  t  ,  i.c.,  0<d(l,0,  q)<t  . 

Our  intent  is  to  find  an  event  which  implies  that  C  >  L  and  to  underbound  the  probability  of 
this  event.  It  was  observed  in  Chapter  II  that  if  D  is  defined  as  the  minimum  value  of  the  cor¬ 
rect  path  metric  at  or  following  (1,0,  q),  and  T^^  is  the  threshold  just  below  D,  then  at  least  one 
computation  (a  forward  look)  is  required  on  node  (m,  s,  q)  and  on  each  node  connecting  it  to 
(1,  0,  q)  if  D  <  0,  and  node  (m,  s,  q)  and  all  nodes  connecting  it  to  (1,  0,  q)  lie  above  T^^  +  t^. 

One  forward  look  on  node  (m,  s,  q)  and  each  of  the  connecting  nodes  is  required  under  these 
conditions  before  the  decoder  reduces  the  running  threshold  from  T^^  +  t^  to  T^^.  This  latter 
threshold  is  used  at  least  once  since  the  decoded  path  is  the  correct  path  (by  assumption)  and 
this  path  dips  below  Tj^  +  t^  (see  Fig.  11). 

Wc  assume  that  the  channel  is  completely  connected.  This  implies  that  the  path  terminated 
by  some  node  (m,  t,  q)  cannot  fall  from  the  value  of  the  metric  on  the  reference  node,  d(l,  0,  q), 
with  a  sloped  of  magnitude  larger  than  f  (R  —  wherc^ 


I  .  4 
min 


min  log 2 

3,  k 


p 


(16) 


That  is, 


d(m,  t,  q)  >  d(  1,  0,  q)  -  tf  (R  -  I  .  ) 


(17) 


t  The  subscript  n  on  u  or  v  is  reserved  for  sequences  of  n  branches  measured  from  the  origin.  The  subscript  s 
on  u  or  V  will  indicote  sequences  of  s  bronches  meosured  from  the  q^^  correct  node. 

t  Slope  is  defined  os  the  increment  in  the  metric  for  o  one-node  chonge  in  path  penetrotion. 

§  It  moy  be  shown  thot  I  .  <  0. 

'  min 
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|3-6?-3?5?| 


a 


Fig.  1 1 .  Trajectories  of  correct  path  and  incorrect  path. 

We  are  now  prepared  to  describe  an  event  which  implies  C  >  L.  As  shown  by  Kq.(15), 

there  are  b^  nodes  at  penetration  t  or  less  in  the  incorrect  subset.  If  each  of  these  b^  nodes, 

b  >  L  b  ,  lies  above  some  threshold  T.,  and  if  the  correct  path  falls  below  T.  at  some  node 

beyond  {1,0,  q),  say  at  node  (1,  0,  q  +  s)  [which  is  s  branches  removed  from  (1,  0,  q)],  we  find 

that  the  ’’static”  computation  on  just  the  b^  nodes  of  the  incorrect  subset  equals  or  exceeds  L 

t  t 1 

(since  t  is  defined  by  b  >L  >  b  )  so  that  the  total ’’static”  computation  C  equals  or  exceeds  K. 

We  have  the  desired  underbound  if  we  let  T.  be  the  threshold  below  the  value  of  the  path 
metric  on  the  path  {m,  t,  q)  which  falls  at  the  maximum  rate.  In  particular,  we  have  tiiat 

d(l,  0.  q)  -tf(R-l^.^)  >  d(l,0,q)  -U(R-I^^.^)  -  t^^  .  (18) 

If  the  correct  path  falls  below  this  underbound  to  T.,  then  threshold  T.  is  used  and  at  least 
b\  b^  L  ^  b^”\  nodes  in  the  q^^^  incorrect  subset  will  have  at  least  b^  computations  done 
on  them.  Therefore,  the  probability  that  the  correct  path  falls  below  the  of  Eq.  (18)  under¬ 
bounds  [C  >  1.]. 

th 

The  metric  on  the  (q  +  s)  correct  node  is  defined  as  d(l,  0,  q  +  s).  If  d(l,  0,  q  +  s)  is  less 

than  the  underbound  to  T.,  this  threshold  will  be  used.  This  condition  is  written  as 

1 

d(  1,  0,  q  +  s)  <  d(  1,  0,  q)  —  tf  (R  —  I^.^)  —  t^  (19) 

If  we  let  u^  represent  the  s  branches  of  the  correct  path  which  follow  node  (1,  0,  q)  and  let 
be  the  corresponding  section  of  the  received  sequence,  we  have  from  Eqs.  (6)  and  (7) 

s  f 

d(l,  0,  q  +  s)  -  d(l,  0,  q)  =  ^  -  R]  (20) 

r=l  h=l 

d(  1,  0,  q  +  s)  —  d(  1,  0,  q)  ^  I(u^,  Vg)  —  sf  R  (21) 

where  u^^^,  v^^  are  the  h^^  digits  on  the  r^^  branches  of  v^,  respectively.  Equation  (19)  is 
now  rewritten  with  the  aid  of  Eq.  (21). 

I(u  ,  V  )  <  sf  R -tf  (R -- I  .)-t  .  (22) 

s'  s  min  o 
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Recalling  that  >  L  >  and  remembering  that  R  =  (log^b)/!,  we  obtain  the  final  result, 
namely,  if 


I(Ug,  Vg)  <  sf  R  —  (log^  L  +  1 ) 


/R-I  .  \ 
/  min  \ 

I  R  / 


t 

o 


(23) 


then  the  static  computation  C  must  exceed  L.  Therefore,  the  probability  of  the  event  in  Eq.  (23) 
underbounds  Pj^  [C  >  L].  We  note  that  s  is  arbitrary.  It  is  chosen  to  maximize  the  underbound 
to  Pj^  (C  >  L].  The  desired  result  then  is 


Pr  (C  >  L]  >  max  P 


R 


I(Ug,  Vg)  <  sf  R  —  t^  —  (log2  L  +  1 ) 


(-»! 


(24) 


It  should  be  noted  that  the  random  variable  I(u  ,  v  )  is  assigned  with  probability  Pp  [u  ,  v  ], 

S  S  fv  S  S 

which  is  the  probability  that  the  first  s  branches  of  the  transmitted  and  received  sequences  fol¬ 
lowing  the  q^^  correct  node  are  ^gi  respectively.  The  inequality  of  Eq.  (24)  applies  to  any 
particular  code  and  Ug  is  a  codeword  (of  s  branches)  in  this  code. 

Let  Pg(x)  =  ^  lower  bound  result  is  formally  summarized  below. 

Theorem  1. 

The  "static”  computation  in  the  q^^  incorrect  subset,  when  the  Fano  algorithm  is  used  on 
the  completely  connected  DMC,  has  the  following  bound  on  its  cumulative  probability  distribution: 

Pj^  [C  >  L]  >  max  Pg  |sJ  R -t^  -  (log2  L  +  1)  ^ (^5) 

where  I  is  defined  by  Eq.  (16). 

Next  we  further  lower  bound  Eq.  (25)  so  that  the  dependence  of  the  bound  on  L  and  R  be¬ 
comes  explicit.  First,  we  lower  bound  Pg(x)  in  terms  of  the  smallest  value  of  the  conditional 

probability  p  (x|ll  ),  defined  as  the  conditional  probability  that  I(u  ,  v  )  <  x  given  u  . 

s  s  s  s  s 


u  in  the  code 
s 


(26) 


Pg(x)  >  min  p  (x|ug) 
all  u 

s 


(27) 


Here  the  minimum  is  taken  over  all  words  of  si  digits,  not  just  words  in  the  code.  Since  Eq.  (27) 
is  independent  of  code,  we  shall  use  it  to  obtain  a  bound  valid  for  all  codes.  Equality  holds  in 
Eq.  (27)  under  certain  conditions  on  the  channel  and  the  probability-like  function  ^(Yj^-  Equality 
is  equivalent  to  saying  that  Pg(x)  is  independent  of  the  code.  The  conditions  are: 


(a)  The  channel  is  uniform  at  the  input,  i.e.,  the  set  of  transition  proba¬ 
bilities  (p(y-/x,  )},  1  <  j  <  J  is  independent  of  k; 

(b)  Hyj)  =  constant  for  all  1  <  j  <  J. 


In  the  second  major  step  directed  at  exhibiting  the  dependence  of  the  bound  on  L  and  R,  we 
introduce  and  apply  a  theorem  due  to  Gallager.^^  We  shall  use  it  to  underbound  Pg(x[Ug).  Al¬ 
though  it  is  a  weaker  theorem  than  the  Central  Limit  Theorem  for  Large  Deviations  (Ref.  18), 
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it  is  sufficient  to  denionstraie  the  dependencies  of  [C  >L1  for  large  L  because  the  two 
theorems  are  asymptotically  equal. 

Theorem  2.  (Gallager) 

Let  1  i  <  N  be  a  set  of  statistically  indepemdent  random  variables.  ^  .  assumes  the 

J  values  w. 1  <  i  J,  with  probabilities  {P„(w.  .)}.  Let  f  be  the  sum  of  these  N  variables, 

J  '  p  '  R  ij  '  ^ 

^  =  D  ^  Define  ^1^(0-)  byt 


=  log^  2  ^ 


(28) 


Then, 


N 


^  log^  2*^^  =  Yj 
i  =  1 


(29) 


and  for  rr  <  0  we  have 


P,^  |^  ^  H^'((r)l  >  2  ^ 


1  1h((t)-(T  ,((t) 


exp 


2N(1  -  p  .  ) 
^  min 


where  the  prime  indicates  differentiation  with  respect  to  cr,  and  p  .  is  defined  by 
I  ‘  ^  min 


p  .  ^  min  P,3  [min  w..] 

mnin  —  K  ‘  n' 

1  3 


(30) 


(31) 


To  use  this  theorem  in  und<'rbounding  pg(x|u^),  we  must  associate  the  N  random  variables 
J  with  the  randoi'n  variables  appearing  in  the  definition  of  p  (x|u  _).  W'c  recall  that 


=  P(^  |Hu^,  v^)  ^  x|u^l 


(32) 


where  defined  from  Eqs.  (20)  and  (21)  as 


s  f 

I(u  ,  V  )  = 

o'  o 


^  LJ  f(v  ) 

r  =  l  )i  =  l 


(33) 


and  u  ,  ,  V  ,  are  the  h^^  of  (  dibits  on  the  r^^  branches  of  u  ,  v  ,  respectivelv.  With  u  fixed, 
rh  rh  ^  s  s  ‘  s 

this  random  variable  HUg,  v^)  is  assigned  with  probability 


s  f 

Pr  =  n  n  p  [v^h/“rhi 


r=l  h=l 


[Eq.  (2)1 


The  sf  random  variables 


I,  LLrh/“rhl| 

r°^z  f(v  , )  I 
I  rh  J 


t  The  bar  notation  indicates  a  statistical  average. 
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are  therefore  statistically  independent  and  assigned  with  probabilities  p  •  Thus,  if  we 

make  the  following  indentifications,  Theorem  2  applies  to  Pg(x|Ug): 


N  i  sf 


i  =  (r  -  1)  i  +  h 

P  (V  v./u. 


.  A  ^  ^  rh  rh^ 

^i=  nVj~ 

rh 


A ,  p  ty/^hi 

'"ij  f(y^) 


P„  [w..]  ^  p  [y./u  ,  ] 
R  I  ly  ^  y  rh' 

p’(cr)  =  X 


(34) 


The  particular  definition  of  the  index  i  is  one  which  leads  to  a  natural  ordering  of  the  sf  pairs 
rh  rh 

Before  we  apply  Theorem  2  to  p  (x  u  )  we  observe  that  by  decreasing  p  .  we  further 
r-r  ^  I-  s  '  s  ^  '^min 

weaken  the  inequality  of  Eq.  (3  0).  Therefore,  we  may  replace  with 


P  ^  min  p  fy./x,  1 
min  .  ,  ^  V  k  ’ 


(35) 


Now  let  us  consider  the  form  of  p^(a)  and  of  p(cr).  From  the  definitions  of  Eq.  (34)  we  have 


p.((7)  =  log^  Zp[yj/“rhl 
j  =  l 


(36) 


If  we  define  Q  =  (q.,  .  .  .  ,  q,  )  as  the  composition  of  codeword  u  ,  that  is,  if  Nq  represents  the 
—  o  1  K  K  ^  ^ 

number  of  times  channel  input  symbol  x,  appears  in  u  ,  S  q  =  1,  then  we  have  for  p(cr)  the 

X  s  k 

following: 


N  K 

M-(o-)  =  Yj 

i=l  k=l 

where 

J 

yj^((T)  =  log^  Y  P  f(yj)'‘^  •  (38) 


All  terms  of  Theorem  2  have  been  defined  so  that  we  may  now  state  the  desired  low^er  bound  to 
p  (x|u  ).  If  u  has  composition  Q  ,  then, 


K 

N  S  qj^[Y^(CT)-oTj^(ff)] 

/  I _  .  .  1  ~  k  =  l 

>  2  ^ 


r2N(i  -  p  .  ) 

1  , 

1  min 

^  J 

P  . 
min 

(39) 
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This  bound  is  independent  of  the  order  of  symbols  in  the  codeword.  Therefore,  for  that  (unusual) 
class  of  codes  having  all  codewords  of  the  same  composition,  this  lower  bound  applies  directly 
to  all  words  u^  in  the  code.  Moreover,  independence  of  the  order  of  symbols  in  a  codeword 
applies  to  p(x|u  )  as  well  as  to  its  lower  bound:  it  can  be  shown  that  p  (x|u'  )  =  p  (x|u  )  when 
u^  and  Ug  have  the  same  composition.  It  follows  that  the  inequality  of  Eq.  (27)  is  weaker  than 
necessary  for  codes  of  fixed  and  known  composition;  for  this  class  of  codes  we  may  write 

Pg(x)  =  Pg(x|Ug)  (40) 

for  any  u^  in  the  code.  It  should  be  noted  again  that  Eq.  (40)  applies  only  to  codes  of  fixed  com¬ 
position,  whereas  Eq.  (27)  applies  to  all  codes. 

Our  primary  task  is  to  exhibit  the  dependence  of  the  bound  of  Theorem  1  on  L  and  R.  We 
now  have  the  necessary  tools  to  do  this.  We  use  either  Eq.  (40)  or  Eq.  (27),  depending  on  whether 
the  bound  is  to  apply  to  a  code  of  fixed  composition  or  is  to  apply  to  all  codes,  together  with  the 
bound  of  Eq.  (39)  and  the  inequality  of  Eq.  (25)  of  Theorem  1.  We  shall  consider  the  fixed  composi¬ 
tion  case  first  since  it  serves  as  an  introduction  to  the  general  lower  bound. 

For  fixed  Q^,  we  have  from  Theorem  1,  the  definition  of  Eq.  (34),  the  equivalence  of  the  state¬ 
ment  in  Eq.  (40),  Theorem  2  and  the  bound  of  Eq.  (39)  the  following  lower  bound  to  [C  LJ: 

P  [C  >  L]  >  max  p  (x)  >  max  p  (x|u  ) 

n  to  s  & 


1 

K 

k=l 

4 

/  2N(1  -  P  .  ) 

/  min 

—  max  - 

2  exp 

e  i 

1  P  . 

N 

min 

where 

cr  <  0 

X  =  NR  -  F 

Fit^  +  dog^L+l)  (— 

N  ^  si  .  (42) 

The  maximization  over  N  in  Eq.  (41)  is  taken  subject  to  the  following  constraint 
K 

k=l 


or 


N  = 


F 

K 

R  ~  2  q.yMff) 

k=l  ^  ^ 


(43) 


which  is  implied  by  the  first  equation  in  Eq.  (42),  the  last  equation  in  Eq.  (34)  and  the  definition 
of  M-(o-),  Eq.  (37).  The  function  F  is  independent  of  and  N,  and  is  constant  with  respect  to 
the  maximization. 
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strictly  speaking,  the  maximization  on  N  must  be  taken  only  for  values  of  N  which  are 
multiples  of  i,  the  number  of  digits  per  tree  branch.  We  now  drop  this  constraint  and  permit  N 
to  assume  all  values  1  <  N  <  °o.  The  imprecision  introduced  neither  affects  the  character  of  the 
end  result  nor  materially  alters  its  numerical  value. 

Let  us  now  consider  the  connection  between  N  and  g  from  the  second  equation  in  Eq.(43). 
One  can  show  that 


(44) 


where  ^  assumes  the  same  values  as  does  ^  ^  of  Eq.  (34)  but  it  is  assigned  each  such  value  with 
the  probability 


H 


(45) 


when  u^^j^  =  Xj^.  Consequently,  y^(o-)  is  monotone  increasing  in  a,  which  implies  that  N  is  mono¬ 
tone  increasing  in  a.  Since  0  N  <  «» ,  we  must  restrict  a  in  Eq.  (43)  to  be  less  than  the  value  at 
which  M  is  infinite.  We  shall  impose  this  restriction  implicitly  by  extending  the  definition  of 
l/[R  —  ^  that  it  is  infinite  for  a  larger  than  the  critical  value.  At  the  end  of  the 

next  paragraph,  it  will  become  clear  that  this  extension  does  not  affect  the  maximization,  serv¬ 
ing  only  to  simplify  the  analysis. 

We  return  now  to  the  maximization  of  Eq.  (37).  If  h(N)  and  q(N)  are  positive,  then 


max  h(N)  q(N)  ^[max  h(N)l  q(N')  (46) 

N  N 


w'here  N'  may  assume  any  value.  Thus,  if  we  maximize  Eq.(4l)  with  respect  to  the  first  of 
the  two  factors,  we  further  lower  bound  [C  >L].  The  maximum  of  the  first  factor  occurs 
at  the  maximum  of  the  exponent 

K 

h  =  l 


Let  us  study  this  exponent.  It  is  negative  since  f((j)  is  negative.  We  see  this  by  obsei'ving  tiiat 

K 

t(a)  assumes  value  zero  at  cr  =  0  and  has  derivative  E  q,  (— a)  'yP((7)  ^0  for  a  0,  the  range  of 

k=l 

G  of  interest.  To  determine  whether  the  exponent  N€((7)  has  a  maximum  in  a,  we  take  the  first 
derivative  with  respect  to  a. 


K 


K  D  [y^(a)  -  (jy;^((7)l 

Yj  Hk  (y  ^  — 


d(T  ''  LJ  I 

k  =  l 


K 

k  =  l  ^  ^ 


Ik?,  ■’kn'i"'!  I"  k„ 

K 

R-  “  q^yM<^) 
k=i 


“  \  -<7  ■ 


(48) 
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K 

All  factors  are  positive  for  a  <  0,  with  the  possible  exception  of  the  term  R  -  2  q,  [y.  (cr)]/cr. 

K  k=l 

Since  T  qj^  has  derivative 

k=l 


K 


k=l 


K 


Z 


a  J 


(49) 


K 

we  find  that  R  —  2  q,  \yA(j)]/o‘  is  positive  for  a  <  a  and  negative  for  cr  >  a  where  a  is  such 

k  =  l  ^  ^  ®  o  o 

that 


K 


R=  Z 

k=l 


‘Ik 


cr 

o 


(50) 


We  can  now  sketch  N€(a)/F  for  a<  0  (see  Fig.  12).  It  is  negative  for  a<  0  and  has  a  maximum 

at  a  =  (7  .  The  value  of  this  maximum  is 
o 


F 


€(<7o) 


o  'k=l  ' 


cr 

O 


(51) 


i.e.,  the  maximum  (a  ,  (x  )  lies  on  a  straight  line  of  slope  one  passing  through  the  origin.  For 
K  °  ° 

a  4  (J.  R  >  2  q  [y.Ict  )/0  SO  that  N€(a)/F  >  a,  that  is,  Nc(a)/F  lies  above  the  unit  slope 

o  k  K  o  o 

line  passing  through  the  origin  for  a<  a^.  Maximizing  Ne(a)  over  N  is  equivalent  to  maximiz¬ 
ing  this  exponent  over  a  where  N  and  a  are  related  by  Eq.(43).  Therefore,  the  maximum  of 

a^F 

the  first  term  in  Eq.(41),  2  ,  is  related  parametrically  to  the  rate  R  by  Eq.  (50). 

The  final  bound  is  obtained  if  in  the  second  factor  of  Eq.  (41)  we  use  N'  =  N(o-^),  the  value 
of  N  which  maximizes  the  first  factor.  Then  using  Eq.  (46)  we  have  for  the  fixed  composition 
case 


A  <T  F 
1  o 


[C  >L]  >-2  ^  exp 


-±4f 


1  -  P 


c(a  )  P 
o  min 


(52) 


Fig.  12.  Behavior  of  N c(a )/{F)  with  a. 
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where  a  <  0  is  such  that 
o 


R 


K 

Z 

k  =  l 


'Ik 


[Eq.  (50)] 


The  range  —  1  <  <  0  suffices  since,  as  shown  in  Eq.  (49),  the  sum  in  Eq.  (50)  is  monotone  in¬ 

creasing  in  a,  being  negative  for  a  <  —  1.  This  is  the  lower  bound  result  for  the  fixed  composi¬ 
tion  case.  We  must  now  consider  the  general  lower  bound,  valid  for  all  codes.  We  shall  use 
many  of  the  results  obtained  above. 

To  obtain  the  general  lower  bound,  we  lower  bound  [C  ^  L]  using  Theorem  1  and  in¬ 
equalities  (27)  and  (39). 


R 


1 

.Nc((r) 

4 

/  2N(1  -  P  .  )  1 
/  min 

y  max 

min 

2  exp 

/  P 

N 

So 

L  ^ 

'  mm 

(53) 


where  0.  We  would  like  to  focus  attention  on  the  first  of  the  two  factors  above.  We  justify 
our  doing  this  as  follows:  Let  h(N,  Q^),  g(N,  Q^)  ^  0.  Then, 


h(N,  Q  )  min  h(N,  Q  ) 

°  IQo 


g(N,  Q  )  >  I  min  g(N,  Q  ) 

Q 


h(n,  Q^)  g(N,  Q^)  > 


min  h(N,  Q  )}  |  min  g(N,  Q  ) 

Qo 


SO  that 


and 


min  {h(N,  Q  )  g(N,  Q  )}  >  min  {h(N,  Q  )}  min  {g(N,  Q  )} 
So  So  So 


max  min  (h(N,  Q  )  g(N,  Q  )}  >  max  min  (h(N,  Q  )}  min  {g(N’,  Q  )}  (54) 

NQ  °  ®NQ  ® 

3o  ^o  ~o 

In  the  last  step  we  have  used  Eq.  (46).  Thus,  if  we  minimize  the  second  term  in  Eq.  (53)  on 

and  use  in  it  the  value  of  a  which  achieves  the  max-min  of  the  first  term  we  will  have  a  valid 

lower  bound.  We  minimize  the  second  factor  Q  if  we  maximize  N’  on  Q  . 

^o  — o 


N  (a)  ^  max  N'(a)  =  ^ : — v 
max  “  o  ^  ~  ^ 

— o  k 


(55) 


Then,  we  have 


R 


'  max  min  N€(a)' 

1 

N  Q 

2  ^ 

'  exp 

4 

/  2N  (ct)  (1  -  P  .  ) 

/  max  mm 

2  ^ 

1  P  . 

mm 

(56) 


Our  next  concern  is  with  the  max-min  of  Nc(a).  We  assert  that  the  minimum  on  (the 
components  of  are  positive  and  sum  to  one)  of  N€(a)  occurs  when  has  a  single  nonzero 


component,  having  the  value  unity.  This  component  q. 


1  is  such  that  fixed  a  <  0 
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30 


all  k 


(57) 


{<t)  -  (d) 

_ o 

R  -  {(j) 


< 


yj^(a)  -  ayj^((j) 

R 


This  assertion  is  proved  as  follows:  Let  6  be  defined  as  the  difference  between  Ne(a)/F  for 
arbitrary  and  the  value  of  N€(d)/F  at  the  supposed  minimum  on  Q^.  Then  we  have 


.A  5 

^  ^  L  K 

k  =  l  R  -  S  qj^yJ^(CT) 
k=l 


(<t)  -  ay'^  (a) 
o _  o 

R  -  (ct) 


(58) 


K 

Using  Eq.  (57)  and  remembering  that  by  extension  of  its  definition  R  —  2  cannot  be 

k  =  l 

negative  for  any  Q^,  we  see  that  6  >  0.  We  also  observe  that  6  =  0  for  the  assumed  composition. 
Thus,  this  composition  achieves  the  minimum.  Now  if  we  sketch  [y,  (d)  —  dy!  (d)]/[R  ~y,  (d)]  for 
each  k  and  all  values  of  d  <  0  (keeping  in  mind  that  R  ■-yj^(d)  >  0  by  extension  of  its  definition) 
we  see  that  we  achieve  the  minimum  N€((j)  on  by  taking  the  lower  envelope  of  these  functions 
(see  Fig.  13).  Notice  that  the  maxima  of  the  individual  functions  occur  on  the  straight  line  of 

unit  slope  passing  through  the  origin.  The  maximum  of  the  k  function  occur-s  at  d  =  d,  where 

th  ^ 

dj^  is  such  that  R  =  ^  function  lies  above  the  unit  slope  straight  line 

passing  through  the  origin. 

Figure  13  provides  a  graphical  interpretation  of  the  function  min  Ne(d)  vs  d.  We  now  con- 

9o 

centrate  on  maximizing  this  minimum  on  N  or,  equivalently,  on  d  <  0.  We  assert  that  this  max¬ 
imum  occurs  in  Fig.  13  on  the  straight  line  of  unit  slope.  This  should  be  clear  from  the  figure. 

If  dj^  is  such  that  R  =  that  is,  if  are  the  loci  of  the  maxima,  we  further  assert 

that  the  maximum  over  d  of  min  Ne:(d)  occurs  for  d  equal  to  the  smallest  of  the  d,  .  This  too 

So 

should  be  clear  from  the  figure. 

We  have  found  that  the  max-min  of  the  exponent  Nc(d)  occurs  at  the  maximum  (j^  of  one  of 
the  functions  [yj^(d)  —  dyj^(d)]/R  “yj^(<^).  that  this  particular  maximum  is  the  smallest  of 
the  maxima.  At  the  particular  maximum  we  have 

max  min  Ne(d)  =  d  F  (59) 

N  Q  ® 

-o 

where  d^  is  the  smallest  of  the  satisfying  R  =  yj^^ Since  yj^(d)/d  is  monotone  in¬ 

creasing  in  d  from  Eq.  (55),  we  see  from  Fig.  14  that  the  smallest  dj^,  as  a  function  of  R  is  the 
solution  to  the  equation: 

R  =  max  -  "  .  (60) 

k  ^o 

If  wc  now  choose  d  =  d  in  N  (d),  the  value  of  N'  in  the  exponent  of  the  second  factor  of  Eq.  (56). 
we  have 


N 

max 


max 

k 


F 


—  max 
k 


(61) 
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The  denominator  is  positive  because  y-^((T)/(T  >yj^(c7)  as  implied  by  the  fact  that  —  o-y|^((T)  = 

e(or)  <  0,  for  a  <  0. 

The  complete  general  lower  bound  to  [C  >  L],  valid  for  all  codes,  can  now  be  stated. 


i  Fa 

[C  >L]  2  ^  exp 


e 


2(  1  -  P  .  ) 
mm 


P  I  max  -  —  max  yJ  (a  ) 

mm  I  ^  <T^  k  ° 


(62) 


where  is  the  solution  to  the  equation 


R  =  max 


k  % 


[Eq.  (60)] 


We  collect  the  lower  bounds  to  Pj^  [C  >L]  for  the  two  cases  in  the  following  theorem. 

Theorem  3. 

On  the  completely  connected  DMC,  the  random  variable  of  "static"  computation  C  has  the 
following  lower  bound  to  its  cumulative  probability  distribution  function,  P^^  [C  >  L]: 


i  For 

Pj^  (C  >  L]  >  i  2  °  exp 


4  ^min* 

- 

min 


(63) 


where 


P  .  =  min  p  [y./x,  ) 

min  .  ,  ^  y  k' 

J,  k 


[Eq.  (35)] 


F=t,  +  (log2L+l)(^-^) 


A  .  ,  p(y/\' 

^min  =  '"V"  f(y.) 

h  k 


(Eq.(42)] 
[Eq.  (16)] 


and  f(y.)  is  a  probability- like  function  of  output  symbol  y.,  interpreted  as  the  probability  of  y. 

J  r  1  ^  ^ 

when  channel  inputs  are  assigned  with  probabilities  1  <  k  <  K. 


K 


f(yj)  ^  E  Pk  P  t^j/^k' 


(64) 


k  =  l 


The  function  A((t^)  and  the  parameter  are  related  parametrically  to  the  rate  R.  The  re¬ 
lationship  depends  on  whether  the  bound  applies  to  all  codes  or  to  codes  of  known  and  fixed 
composition. 


(1)  For  a  code  of  fixed  composition  =  (q^,  .  .  . ,  q^^)  we  have 


A(or  )  = 
o 


5  p!''''”'  ,  , 


(65) 
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[Eq.{50)] 


K 


R=  Z 


(7 


for  —  1  <  a  <0 
o 


k=l 


(2)  For  all  codes  we  may  choose 


A{<7^)g 


Yk^%) 

max  -  —  max  y,’  (a  ) 

,  a  ,  k  o 

k  o  k 


(66) 


Yk(%) 

R  =  max  -  for  —  1  <  a  0 

1  O'  o 

k  o 


[Eq.  (60)] 


Here  yj^(<y)  is  defined  as 


Yj^((7)  4  log2  Yj  P  [yj/^k' 

j=l 


[Eq.(38)] 


An  important  observation  can  be  drawn  immediately  from  the  bound  of  Eq.  (63).  For  very 

large  F,  corresponding  to  very  large  L,  the  bound  is  controlled  almost  entirely  by  the  factor 
Fa  (-a  )(R-I  .  )/R 

'  r\ '  '  rv-n  n  ' ' 


2  Thus,  the  bound  behaves  as  (l/L)  ^ 

is  algebraic  with  large  L. 


for  large  L,  so  that  the  distribution 
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CHAPTER  IV 

RANDOM  CODE"  BOUND  ON  THE  DISTRIBUTION  OF  COMPUTATION 


The  previous  ehapter  has  established  the  algebraie  eharacter  of  the  distribution  of  "static" 
eomputation.  In  this  chapter,  wc  shall  obtain  an  overbound  to  the  distribution  of  computation 
averaged  over  the  ensemble  of  all  tree  codes.  By  so  doing,  we  show  that  a  large  number  of  codes 
exists  whose  distribution  of  "static"  computation  is  bounded  by  a  multiple  of  the  average.  To¬ 
gether,  the  results  of  this  chapter  and  of  the  pfrcceding  chapter  de-limit  the  tail  behavior  of  the 
distribution  of  computation.  Chapter  V  will  interpret  and  relate  the  result  of  these  two  chapters. 


A.  RANDOM  VARIABLE  OF  COMPUTATION 

The  approach  we  use  to  bound  the  ensemble  average  of  the  distribution  of  computation  re¬ 
quires  that  wc  overbound  the  random  variable  of  "static"  computation.  The  discussion  of 
Chapter  II  is  sufficient  to  allow  a  bound  on  this  random  variable.  We  repeat  the  pertinent  ar¬ 
guments  of  that  chapter. 

"Static"  computation  associated  with  the  incorrect  subset  is  defined  as  the  number  of 
forward  or  backward  "looks"  required  by  the  decoder  in  the  incorrect  subset  associated  with 
the  q^^  node  of  the  correct  path.  This  subset  consists  of  the  q^^  correct  node,  labeled  {1,0,  q), 
and  of  nodes  on  paths  disjoint  from  that  portion  of  the  correct  path  which  extends  beyond  (1,  0,  q). 
A  particular  node  of  this  type  is  labeled  (m,  s,  q)  to  indicate  that  it  is  in  the  q^^  incorrect  subset, 
is  at  "penetration"  s,  that  is,  is  connected  to  (1,0,  q)  through  s  branches,  and  is  m^^^  in  order 
among  the  M(s)  nodes  at  penetration  s.  The  number  of  nodes  at  penetration  s,  M(s),  is  defined 
below. 

M(0)  =  1 

M(s)  =  (b  -  1)  for  s  >  1  .  [Eq.  (1)] 

The  q^^  correct  node,  or  the  reference  node  (1,  0,  q)  is  said  to  be  at  penetration  4ero  in  the  q^^ 
incorrect  subset. 

A  "path  metric"  d(m,  s,  q)  on  node  (m,  s,  q)  has  been  defined.  If  O  is  the  generic  symbol 
representing  the  path  terminating  on  node  (m,  s,  q),  then  the  path  metric  on  this  path  of  n  =  q  +  s 
branches  is  defined  as  follows: 


n  f 

d(m,  s,  q)=  Yj  E  ''rh*  “ 

r=l  h-l 

where  v^^  arc  the  h^^  digits  (of  f  digits)  on  the  r^^  branches  of  0  and  v^,  the  received 

sequence  of  n  branches.^  The  function  U0^^,  '^rh^  defined  by 


1(0  ,  ,  V  ,  )  =  log,  - ~f-  , 

rh  rh  ^2  f(v  ,  ) 

rh 


(68) 


where  is  a  probability -like  function,  interpreted  as  the  probability  of  channel  output 

symbol  v^j^  when  channel  inputs  are  assigned  with  probabilities  {pj^),  1  ^  K.  That  is,  when 

V  .  =  y  .,  we  have 
rh 

t  The  subscript  n  on  subsequences  of  the  transmitted  or  received  sequences,  namely  u^,,  v^,  indicates  their  length 
in  branches  from  the  origin.  The  subscripts  r,  or  s  indicate  their  length  from  the  reference  node  (1,0, q). 
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lEq.  (64)1 


K 

f(yj)  =  E  Pk  p  iy/\^ 

k=l 

Later  in  this  chapter,  we  will  find  that  ^(yj)  is  equal  to  a  probability  appearing  in  the  "random 
code”  argument. 

With  this  path  metric,  the  Fano  decoder  searches  paths  in  the  tree  code  trying  to  find  a  path 
which  tends  to  increase  in  path  metric.  A  set  of  criteria  =  i  t^  is  defined.  A  path  whose  path 
metric  tends  to  cross  an  increasing  sequence  of  criteria  will  with  high  probability  be  the  correct 
path.  As  the  machine  searches  for  the  correct  path  it  must  perform  a  number  of  forward  or  back¬ 
ward  "looks”  from  nodes  in  the  tree.  We  are  concerned  with  a  subset  of  the  total  computation 
ever  performed,  which  consists  of  the  number  of  computations  eventually  performed  in  the  q^^ 
incorrect  subset.  Since  the  machine  computation  depends  on  increments  in  the  path  metric,  we 
may  choose  to  let  the  value  of  the  metric,  d(l,  0,  q),  on  the  first  node  of  this  subset,  (1,  0,  q),  lie 

between  T  =0  and  T,  =  t  ,  that  is,  we  may  assume  that  0  ><  d(  1,  0,  q)  .<  t  . 

o  lo  th  ^ 

We  found  in  Chapter  II  that  the  computation  in  the  q  incorrect  subset  depends  on  the  min¬ 
imum  value  of  the  path  metric  at  or  following  the  reference  node  (1,  0,  q)  and  on  the  trajectories 
of  the  individual  incorrect  paths.  Let  D  be  the  correct  path  minimum  at  or  following  (1,0,  q), 
and  let  be  the  threshold  just  below  D.  We  overbound  computation  on  a  particular  node 
(m,  s,  q)  by  disregarding  the  history  of  the  path  preceding  this  node,  looking  only  at  the  value  of 
the  metric  d(m,  s,  q)  on  this  particular  node.  If  d(m,  s,  q)  is  in  a  favorable  position,  we  include 
node  (m,  s,  q)  in  our  computation  count.  As  discussed  in  Chapter  II,  d(m,  s,  q)  is  in  a  favorable 
position  if  d(m,  s,  q)  ^  In  particular,  if  d(m,  s,  q)  ^  ^  machine  may  do  as 

many  as  (b  +  1)  computations  on  node  (m,  s,  q)  with  each  such  threshold  If  >  d(m,  s,  q), 

the  machine  never  does  any  computation  on  (m,  s,  q)  with 

Before  we  define  a  random  variable  which  overbounds  the  random  variable  of  "static"  com¬ 
putation,  we  further  consider  the  metric  d(m,  s,  q).  Let  d(m,  s)  be  the  change  in  d(m,  s,  q)  from 
the  value  of  the  metric  on  the  reference  node,  d(l,  0,  q).  Then,  if  0  now  represents  the  s 
branches  if  the  q  incorrect  subset  preceding  the  node  (m,  s,  q),  and  if  v^  represents  the  cor¬ 
responding  portion  of  the  received  sequence,  we  have 

d(m,  s)  ^  d(m,  s,  q)  —  d(l,  0,  q) 

s  I 

=  I(e,  v^)  -  sfR  4  Z  Z  [KQrh' 

r=l  h=l 

where  '^rh^  defined  by  Eq.  (68).  Then,  since  we  have  assumed  that  d(l,  0,  q)  lies  between 

T^  =  0  and  T^  =  t^,  we  have  that  d(m,  s,  q)  d(m,  s)  +  t^.  If  d(m,  s,  q)  is  replaced  with  this  larger 

value  for  each  node  (m,  s,  q)  the  computation  required  on  nodes  {(m,  s,  q))  is  increased,  because 

these  nodes  may  be  examined  with  a  larger  number  of  thresholds.  (The  correct  path  minimum 

D  is  not  changed.)  Now,  if  we  decrease  by  an  equal  amount  the  value  of  the  path  metric  on  each 

correct  node  following  the  reference  node,  we  further  increase  the  computation  on  nodes  {(m,  s,  q)}. 

If  we  let  >  t)C  the  r^  branches  of  the  transmitted  and  received  sequences  following  the 

^  o  ^  o 

reference  node,  and  define  d(u  ,  v  )  as  the  change  in  the  value  of  the  metric  from  d(l,0,  q)  to 

o  ^o 

d(l,  0,  q  +  r  ),  the  value  of  the  metric  on  (q  +  r  correct  node,  we  have 
o  ^  o 
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(70) 


d(Ur  ,  )  4  d(l,  0,  q  +  r^)  -  d(l,  0,  q) 

o  o 


=  Ku  ,v  )-rm^  y  )'  [I(u.,v.)-R] 

r  *  r  o  —  u  u  ^  ^  rh'  rh  ^ 


r=l  h=l 

We  note  that  d(  1,  0,  q)  ^0  so  that  d(l,0,q  +  r)^d(u  ,v  ).  Ifd(l,0,q+r)is  replaced  with 
^  ^o  rr  o  * 

o  o 

d(u  ,  "v  )  computation  on  the  incorrect  nodes  {(m,  s,  q)}  is  increased.  We  are  now  prepared  to 
o  ^  o 

an  overbound  to  the  random  variable  of  "statie"  eomputation. 

Using  d(m,  s)  +  t  for  d(m,  s,  q)  and  d(u  ,  v  )  for  d(l,  0,  q  +  r  ),  r  >  0,  we  raise  the  value 

o  o 

of  the  metric  on  ineorreet  nodes  and  lower  the  value  of  the  metric  on  correct  nodes  following  the 
reference  node.  Thus,  we  overbound  the  eomputation  on  ineorreet  nodes.  Equivalently,  we  over¬ 
bound  "static"  eomputation.  Now,  as  discussed  above,  the  machine  may  do  as  many  as  (b  ^  1) 
computations  on  node  (m,  s,  q)  with  threshold  if  d(m,  s)  +  t^  correct 

path  minimum  with  the  metric  d(u^  ,  v^  ).  No  eomputation  is  required  on  (m,  s,  q)  with  if 

o  o 

d(ni,  s)  +  t  <  T,  .  Therefore,  if  there  are  N  thresholds  between  d(m,  s)  +  t  and  T^,,  including 
ok  o  D'  ” 

T^,,  the  machine  may  do  as  many  as  (b  +  1)  N  computations  on  node  (m,  s,  q);  N  is  a  random 
variable.  A  convenient  representation  for  N  in  terms  of  the  upper  bound  to  the  value  of  the  met¬ 
ric  on  node  (m,  s,  q),  d(m,  s)  +  t^,  and  the  lower  bound  to  the  value  of  the  metric  on  nodes  of  the 

correct  path  d(u  ,  v  )  is  had  with  the  random  variable  z.  (m).  We  define  z.  (m)  =  1  if 
^  r  r  1,  s  1,  s 

o  o 

d(m,  s)  +  t  ^T.  (that  is,  d(m,  s)  ^T.  .  since  T.  =  i  t  )  and  if  d(u  ,  v  )  ^  T.,  ,  for  some  r  ^1. 

o  o 

If  these  conditions  are  not  satisfied  z.  (m)  =  0.  This  type  of  random  variable  is  called  a  ehar- 

1,  s  ^ 

aeteristie  function.  Then, 

1  if  d(m,  s)  ^  T.  .  and  d(u  ,  v  )  X  T. ,  .  for  some  r 
»  '^1-1  r'r^i+1  o 


z.  (m) 

1,  s 

0  otherwise 
A  little  reflection  indicates  that 


(71) 


E 


N 


the  number  of  thresholds  between  d(m,  s)  +  t  and  Tr^,.  Therefore, 

*  o  D’  ' 


(b  +  1)  Yj 

1,  O 

i  =  -oo 

is  an  overbound  to  the  eomputation  on  node  (m,  s,  q).  If  this  quantity  is  summed  over  all  nodes  in 
the  q  incorrect  subset,  that  is,  for  1  m  ^  M(s),  0  s,  we  have  an  overbound  to  the  random 
variable  of  "statie"  computation  C  in  the  q^^  incorrect  subset.  Hence, 

oo  oo  M(s) 

c<  E  E  E  {Zj  g(m)  +  z_.  g(m))  (72) 

i=0  s=0  m=l 

where  M(s)  is  given  by  Eq.  (1)  and  the  i  =  0  term  is  repeated  twice. 
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We  are  now  prepared  to  overbound  the  distribution  of  computation  using  a  ’’random  code” 
argument. 

B.  MOMENTS  OF  COMPUTATION 

Although  a  lower  bound  to  the  distribution  of  computation  Pj^  [C  >  L]  was  found  by  consider¬ 
ing  an  appropriately  chosen  subset  of  the  set  of  events  leading  to  L  or  more  computations,  if  we 
are  to  overbound  this  distribution,  we  must  consider  every  event  which  may  lead  to  L  or  more 
computations.  We  have  overbounded  the  random  variable  of  computation  to  simplify  the  analysis 
and  to  include  each  event  which  might  contribute  to  computation. 

The  technique  which  we  shall  employ  to  overbound  the  distribution  is  to  bound  the  moments 
of  computation  and  use  a  generalized  form  of  Chebysheff's  Inequality. 

Lemma  1.  (Chebysheff’s  Inequality) 

If  C  is  a  positive  random  variable,  then 

cP 

P  [C  p  ^0  (73) 

n  j^p 

where  is  the  p^^  moment  of  C. 

Proof. 


cP  ^  Yj  cPp(c)  ^  lP  Yj 

c  ^Ij  c 

where  p(c)  is  the  probability  that  the  random  variable  C  assumes  value  c.  Q.  E.  D. 

The  following  two  examples  indicate  the  ’’tightness”  that  might  be  expected  with  Chebysheff’s 
Inequality. 

Example  1;—  Let  C  assume  values  0,  c^  with  probabilities  1  —  a,  a,  respectively,  then 


a  c  ' 


and  Pp^  [C  L]  a 


For  L  =  c  ,  the  bound  is  exact, 
o 

Example  2:—  Let  C  ^  1  be  a  continuous  random  variable  with  density  p(C)  =  A/(C^)  where 
a  >  i  and  A  =  a  —  1.  Then,  for  p  <  a  —  1 


cP  = 


and 


R 


fC 


1 

lP 


As  p  approaches  cv  —  1,  the  moment  (hence  the  bound)  becomes  indefinitely  large.  However, 
the  behavior  of  the  tail  as  a  function  of  I^  more  closely  approximates  the  true  tail  behavior 

Judging  from  Example  2  and  the  fact  that  the  distribution  of  computation  is  algebraic,  we 
should  expect  that  the  application  of  Lemma  1  will  lead  to  a  bound  which  degenerates  rapidly  as 
the  tail  behavior  of  the  bound  approaches  that  of  the  true  distribution.  This  phenomenon  will 
appear  in  our  results. 

Moments  of  computation  cannot,  as  a  rule,  be  computed  directly  for  any  arbitrary  code. 
We  can,  however,  compute  these  moments  over  the  ensemble  of  all  possible  tree  codes,  and 
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deduce  that  at  least  one  code  has  moments  less  than  the  ensemble  average.  The  ensemble  of 
codes  is  generated  by  assigning  probabilities  to  the  codes  in  such  a  way  that  each  digit  (there 
are  f  per  tree  branch)  is  statistically  independent  and  identically  distributed  and  is  assigned 
with  probabilities  {pj^)^  that  is,  channel  digit  occurs  on  a  branch  in  a  code  with  probability 
Pj^.  Note  that  we  have  deliberately  chosen  the  probability  assignment  used  to  compute  f(y  ), 

Eq.  (64). 

As  the  last  topic  in  this  section,  we  introduce  Minkowski's  Inequality  (see  the  Appendix  for 
proof) . 

Lemma  2.  (Minkowski’s  Inequality) 


l.et  1  v<  h<:  H  be  a  set  of  positive  random  variables.  Then 


p  ^  1 


(74) 


Using  this  inequality  on  Eq.  (7  2),  the  upper  bound  to  the  random  variable  of  computation,  we 
have  as  a  bound  on  the  moments  the  following: 


(75) 


where  M(s)  is  defined  by  Eq.  (1)  and  we  use  the  fact  that  z.  (m)  >  0. 

1,  s 

Evaluating  the  moments  without  using  Minkowski's  Inequality  seems  to  be  a  practical  impos¬ 
sibility  because  of  the  number  of  cross  terms  which  occur.  With  this  inequality  we  reduce  the 
problem  to  that  of  computing  moments  of  computation  on  incorrect  paths  at  the  same  length 

/M(s)  ^ 

with  the  same  threshold,  namely,!  2  z.  g(m)j  .  If  p  is  an  integer,  the  latter  term  may  be 
expanded  as  follows:  \m-l  / 


(M(s)  \p  M(s)  M(s)  _ 

E  ^  .  ^  . 

m=l  /  m  =1  m  =1 

1  p 

where  the  terms  in  the  expansion  are  expectations  of  a  composite  characteristic  function  or 
probabilities.  Since  an  expansion  of  this  type  does  not  apply  to  fractional  p,  we  shall  limit  our 
attention  to  integer  p. 

In  following  sections,  the  first  term  in  Eq.  (75)  will  be  overbounded.  Since  the  first  and 
second  terms  differ  only  in  the  sign  of  the  index  i,  we  shall  find  that  the  bound  on  the  first  term 
can  be  applied- with  minor  modification  to  the  second  term  of  Eq.  (75). 


C.  PRELIMINARY  COUNTING  ARGUMENTS 

The  two  terms  in  Eq.  (75)  differ  in  the  sign  of  the  index  i.  This  section  will  deal  primarily 
with  the  first  term,  but  the  discussion  here  may  also  be  applied  directly  to  the  second  term. 
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We  are  considering  the  term 


Z  E 

i=0  s=0 


(77) 


The 


th 


P 


moment  term  has  been  expanded  in  Eq.  (7  6)  for  integer  p,  the  only  case  considered. 


'M(s)  \p 

Z 

km=l  / 


M(s)  M(s) 

Z  .  Z  . 

m  =1  m  =1 

1  p 


(78) 


The  subscripts  i,  s  have  been  dropped  for  the  remainder  of  this  section. 

In  Eq.  (78),  the  terms  corresponding  to  (m^,  m^,  m^,  m^)  =  (1,  10,  4,  10)  and  (4,  1,  10,  1)  for 
the  case  p  =  4  are  equal  since  z^(m)  =  z(m)  =  1  or  0  and  the  ordering  of  characteristic  functions 
in  the  product  does  not  affect  the  value  of  the  product.  This  suggests  that  many  terms  in  Eq.  (78) 
are  equal,  since  the  indices  (m^,  .  .  .  ,  m^)  are  dummy  variables.  Let  us  now  consider  the  multi¬ 
plicity  of  a  particular  term. 

Assume  that  the  p-tuple  of  indices  (m^,  m^,  ...»  m^)  contains  t^  p  distinct  elements 
{Of,  ,  G^}.  (Each  corresponds  to  a  particular  incorrect  path  of  s  branches.)  Since 

z(m^),  .  .  .  ,  z(mp)  =  z(G^),  .  .  .  ,  z(G^),  all  p-tuples  with  the  set  ,  G^}  as  distinct  elements 

have  corresponding  terms  which  are  equal.  Let  W(t,  p)  be  the  number  of  such  p-tuples.  This 
number  is  independent  of  the  particular  elements  in  the  set  of  t  distinct  elements.  We  bound 
W(t,  p). 

W(t,  p)  may  be  viewed  as  the  number  of  ways  of  placing  one  ball  in  each  of  p  distinguishable 
cells  where  the  balls  are  of  t  different  colors  and  each  color  must  appear  at  least  once.  The 
number  of  such  collections  of  p  balls  is  less  than  the  number  of  collections  one  would  have  if 
we  include  the  situations  where  one  or  more  colors  do  not  appear.  This  larger  number  is  the 
number  of  ways  of  placing  t  different  elements  in  each  of  p  distinguishable  cells,  or  t^.  There¬ 
fore,  W(t,  p)  ^  t^. 

To  underbound  W(t,  p),  we  now  establish  that  W(t,  p)  ^  t  W(t,  p  —  1).  Consider  W(t,  p  —  1), 

the  number  of  ways  (p  —  1)  balls  of  t  different  colors  may  be  placed  in  (p  —  1)  cells.  Consider 

th 

extending  the  collection  by  placing  one  additional  ball  with  one  of  the  t  colors  in  a  p  cell.  This 
new  collection  contains  t  W(t,  p  —  1)  items.  It  must  contain  fewer  items  than  does  the  collection 
of  W(t,  p)  items  because  one  color  appears  at  least  twice  and  every  other  color  at  least  once, 
establishing  the  desired  bound.  Iterating  this  lower  bound  (p  —  t)  times  and  observing  that 
W(t,  t)  =  t!  we  have  W(t,  p)  ^  t^  ^tl  The  two  bounds  are  summarized  in  the  following  lemma. 

Lemma  3. 


The  number  W(t,  p)  of  different  p-tuples  (m^,  .  .  .  ,  m^)  generated  from  the  set  of  t  distinct 
elements  {G^,  G^,  .  .  .  ,  G^},  each  element  appearing  at  least  once  has  the  following  bounds: 


W(t,  p)  ^  tP  .  (79) 

Proof. 

19 

We  use  the  fact  that 

t:  e'*  . 
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The  second  and  final  counting  argument  anticipates  results  to  be  obtained  in  the  next  section. 
First,  however,  let  us  rewrite  Eq.  (78)  in  terms  of  W(t,  p). 

(IVI(s)  \p  min[M(s),p] 

z(m)J  =  W(t,  p)  z(0 ,  z(0^)  (80) 

m  =  l  /  t  =  l  all  sets  of  t 

distinct  elements 

The  upper  limit  on  t  indicates  that  the  number  of  elements  in  a  p-tuple  (m^,  m^,  .  .  .  ,  m^)  cannot 
exceed  either  p  or  M(s),  the  number  of  values  of  each  index.  In  constructing  the  sets  of  t  dis¬ 
tinct  elements  ^2*  *  •  '  *  draw  each  0^  from  a  set  of  M(s)  items.  They  correspond  to 

nodes  at  penetration  s  in  the  incorrect  subset  and  are  otherwise  labeled  as  (0^,  s),  1  a><:  t. 

The  terms  z(0^)  z(02),  .  .  .  ,  z(0^)  in  Eq.  (80)  are  probabilities  defined  on  t  distinct  paths  at 
penetration  s  in  the  incorrect  subset.  These  t  paths  are  composed  of  a  number  of  branches 
which  is  less  than  or  equal  to  ts,  since  some  paths  may  have  branches  in  common.  (See  Fig.  15 
where  the  paths  involved  are  checked.)  The  next  section  will  show  that  z(0^),  .  .  . ,  z(0^)  may  be 
bounded  in  terms  of  the  number  of  branches  on  the  paths  {0|,  .  .  .  ,  0^}.  That  being  the  case,  any 
two  sets  of  t  different  paths  with  the  same  number  of  branches  will  have  the  same  bound.  We 
now  proceed  to  count  the  number  of  sets  {0^,  .  .  ,  ,  0^}  with  an  equal  number  of  branches. 

The  paths  {0^,  ,  0^}  may  be  visualized  by  placing  a  check  next  to  each  of  these  paths 

(of  length  s)  in  the  tree.  Above  every  branch  on  a  path  ending  with  a  check  place  a  1  (see  Fig.  15). 
The  number  of  such  ones  equals  the  number  of  branches  on  these  t  paths.  Let  be  the  number 
of  ones  on  branches  at  length  r  from  the  reference  node  and  define  a  by  o  =  (o^,  ,  .  .  ,  .  .  ,  , 


3(2-3209 


y 


y 

y 

y 

y 


y 


Fig.  15,  Topology  of  tree  poths. 
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_  s 

In  terms  of  a,  the  number  of  branches  on  the  t  paths  .  0^}  equals  a  ^  Z  a  .  Let 

r  =  1 

N^(a)  be  the  number  of  sets  of  t  distinct  paths  ,  0^}  which  contain  a  branches.  The 

following  lemma  bounds  N^(a). 

Lemma  4. 

N^(a)^  (t- D!  (s  +  1)^'^  2“^^  (81) 


s 

where  «  =  Z  a  ;  a  ranges  between  s>$^  a  st. 
r  =  l  ’’ 


Proof. 

The  proof  is  by  construction.  We  first  show  that  N^(a)  (t  “  1)1  for  s  ^  1.  Con¬ 

sider  placing  the  first  of  the  t  paths  into  the  incorrect  subset  of  the  tree  (containing  M(s)  ^  b^ 
paths).  It  may  assume  no  more  than  b^  positions.  A  second  path  connecting  with  the  first,  but 

d  A 

having  d^  separate  branches  may  assume  any  one  of  b  ^  positions  since  its  point  of  connection 


to  the  first  path  is  fixed  by  its  length  d^.  A  third  path  with  d^  branches  distinct  from  the  first 

^2 

two  may  connect  to  either  path  and  terminate  in  one  of  b  positions,  that  is,  it  can  assume  no 
d  2 

more  than  2b  places.  The  t**  path  having  d  .  branches  distinct  from  the  first  t  —  1  paths  may 

^-1 

be  connected  to  any'one  of  them  and  may  terminate  in  any  one  of  b  positions;  hence,  can  be 

^t-1 

situated  in  no  more  than  (t  —  1)  b  places.  Thus,  given  that  the  second  path  has  d^  branches 
distinct  from  the  first,  that  the  third  path  has  d^  branches  distinct  from  the  first  and  the  second, 
etc.,  the  number  of  arrangements  of  the  t  paths  cannot  exceed  (t  —  1)1  b^  where  a  = 
s  +  d|  +  d^  +.  .  .  number  of  branches  on  these  paths.  All  that  remains  is  to  determine 

the  number  of  ways  that  values  may  be  assigned  to  d^,  d^,  .  .  .  ,  d^  (Note  that  d^  ^  is  fixed 
given  a  and  d,,  .  .  .  ,  d  ^.)  Since  each  number  d.  represents  a  portion  of  a  path,  1  ^  d.  ^  s,  val- 
ues  may  be  assigned  to  d^,  d^,  .  .  .  ,  more  than  s  ways.  Hence,  the  number  of  arrange¬ 

ments  of  t  paths  containing  a  branches  cannot  exceed  (t  —  1)1  s^  ^b^.  Observing  that  b  -  2^^, 
we  have  the  desired  result  for  s  ^  1.  We  also  have  s^  o?  st  since  one  path  contains  s  branches 
and  the  number  of  branches  on  all  paths  cannot  exceed  st.  Now,  when  s  =  0,  the  bound  on  N^(a') 
is  zero.  We  cannot  let  this  bound  be  zero  since  M(o)  =  1,  and  we  must  include  the  s  =  0  term. 
Therefore,  replace  s  by  (s  +  1).  Q.  E.  D. 

As  mentioned  above,  the  results  of  the  following  section  show  that  z(0.),  .  .  .  ,  z(0^)  may  be 
overbounded  in  terms  of  ct .  Let  this  bound  be  Q.  g(Q^)-  We  terminate  this  section  by  using  the 
counting  arguments  introduced  here  to  bound  Eq.  (76). 

(M(s)  \p  min  [M(s),  p]  st 

^  ^  ^  W(t,  p)  Yj  N^(a)  Q.  g(tt)  (82) 

m=l  /  t=l  o=s 


where  W(t,  p)  and  N^(o)  are  bounded  by  Lemmas  3  and  4,  respectively.  From  Lemma  4,  the 
number  of  values  a  cannot  exceed  st. 
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D.  PROBABILITY  TERM 


The  purpose  of  this  section  is  to  overbound  the  probability  z.  •  •  •  >  z.  and  show 

that  this  bound  depends  on  the  tree  paths  0^,  .  .  .  ,  0^  only  through  a,  the  number  of  branehes 

whieh  they  eontain.  We  call  this  bound  Q.  (o'). 

Before  wc  proceed,  it  is  useful  to  repeat  the  definition  of  the  random  variable  z.  (0  ). 

1,  s  a 

From  Eq.  (71)  we  have 


z.  ^ 

1,  s  a 


if  d(0^,  s)  >T._^ 
for  some  r 

o 

otherwise 


(71) 


The  expectation  of  a  product  of  characteristic  functions  such  as  z^  »  z^  s^^t^ 

joint  probability  of  the  events  on  whieh  each  characteristic  function  has  value  one.  Thus,  we 

have  that  z.  g(0^)»  •  •  •  .  z.  g(^^)  is  the  probability  that  d(0^,  s)  5'T._^,  d(02,  s)  ^T._^,  .  .  .  ,  d(0^,  s) 

^T.  d(u  .V  )  ^T.,,  for  r  =  1  or  2  or  3  or.  .  .  ,  This  is  the  probability  of  the  union  (on  r  ) 
1-1  r  r  1+1  o  ^  o 

o  o 

of  a  set  of  intersections.  This  may  be  overbounded  by  the  sum  of  the  probabilities  of  the  various 
intersections.  Therefore,  we  have 


^i,  •  •  •  ’  ^i,  ^  ^  ^i-1' 

r  =1 
o 


d(u  ,V  )^T.^J  .  (83) 

r  r  1+ 1 ^ 

o  o 

Let  us  reduce  Eq.  (83)  to  a  more  manageable  form.  We  introduce  two  lemmas  to  aid  in  this 
task.  The  first  is  a  probabilistic  statement  and  the  second  is  a  form  of  the  Chernov  Inequality. 

Lemma  5. 

Let  {w^),  1  >$:  h>^  H  be  a  set  of  random  variables  and  1  h  ^  H  a  set  pf  constants. 

Then, 


Pr  [Wj<Wj,W2 


Wh.<Wh] 


’  H 

H 

.h=l 

h=l 

(84) 


where  for  the  inequality  w^  ^  ^  ^  opposite  inequality. 

Proof. 

The  equality  follows  immediately.  The  inequality  follows  since  the  second  event  is  implied 
by  the  first. 

Lemma  6. 

Let  w  be  a  random  variable  and  W  some  constant.  Then, 

Pj^  (w  .  (85) 
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Proof. 


Yj  2^p(w)  2^  Y 

w^W  w^W  Q.  E.  D. 

Equation  (83)  is  overbounded  with  the  aid  of  Lemmas  5  and  6.  We  use  Lemma  5  with 

H  =  t  +  1,  (7  ^0  for  1  ^  a>^  t  and  a...  =  a  <0.  Then, 

*  a  t+1  o  ^ 


. 2 


'a=l  ' 


oo  2  O'  d(0  ,  v)  +  o'  d(u  ,  V  \ 
„  .a  a*  o\r'r/ 

X  Y  °  ° 


r  =1 
o 


(86) 


Any  optimization  now  or  later  of  the  parameters  1  ^  a  ^  t,  is  too  difficult  to  be  rewarding. 

Therefore,  we  let  cr^  =  l/(l  +  t),  1  4:  a^  t,  since  this  selection  leads  to  meaningful  results. 

Recognizing  that  T.  =  +  i  t  ,  and  remembering  that  d(0  ,  s)  =  1(0  ,  v  )  —  siR,  d(u  ,  v  )  = 
10  SI  SI  s  r  r 

_  00 

Ifu  ,  )  —  r  iR  from  Eq.  (69)  and  (70),  where  0  is  the  set  of  s  branches  preceding  (0  ,  s), 

o  o 

we  further  reduce  Eq.  (86).  (It  should  be  remembered  that  t^  is  the  separation  between  criteria 
whereas  t  is  a  variable.) 


-  +t  (r^-CT  )  -it  -stR(T^) 

z.  (ej . z.  (ej<2  °  °  2  ^  °  2  ^  ^ 


i,  s  1 


i,  s  t 


°°  iT>  ^  I(®  i  V  )  +  cr 

^  -O'  r  IR  1+t  .  a'  s 

X  Y  ^  ^  ^ 

r  =1 
o 


t 


l(u  ,  V  ) 


(87) 


where  tree  paths  0  ,  1  a  t,  are  of  length  s.  [Note  that  0  indicates  the  node  (0  ,  s),  whereas 
a  a  a 

0  is  a  tree  path  of  s  branches  preceding  (0  ,  s).] 
a  a 

Now  focus  attention  on  the  expectation  in  Eq.  (87).  The  various  bounding  techniques  and 
choices  of  parameters  to  follow  are  justified  by  the  end  result.  The  following  lemma  will  be 
needed: 

Lemma  7.  (Holder *s  Inequality) 

Let  1  ^  h^  H,  be  a  set  of  positive  random  variables  and  let  1  h  <  H,  be  a  set 

of  positive  numbers  satisfying 


H 


Then, 


H 

H 

n 

Wh-^  n 

h  =  l 

h=l 
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Proof.  (See  the  Appendix) 


The  expectation  taken  in  Eq.  (87)  is  over  the  ensemble  of  correct  and  incorrect  sequences 
and  received  sequences.  Let  v  be  a  received  sequence  which  includes  V  and  v  ,  that  is,  V 

contains  more  than  r^  or  s  branches.  We  may  visualize  the  average  in  Eq.  (87)  as  consisting 
of  two  successive  averages,  the  first  taken  over  the  correct  and  incorrect  sequences  with  the 
received  sequence  fixed  (indicated  with  )  v),  the  second  average  taken  over  the  received 
sequence  v  (indicated  with  'v).  With  v  fixed,  correct  and  incorrect  sequences  are  statis¬ 
tically  independent  by  construction  of  the  "random  code"  ensemble.  This  implies  that 


TTT  ^  1(0  ,  V  )  +  (J  l(u  ,  V  ) 
a=l  ^  ^  ^  '  ^o  o' 


=  2 


Tn 

a=l 


X  2 


(J  l(u  ,  V  ) 


(88) 


where  the  averages  are  conditioned  on  v.  The  average  in  Eq.  (87)  is  the  average  of  Eq.  (88)  over 
V.  We  overbound  the  average  in  Eq.  (87)  using  Lemma  7,  where  the  average  of  that  lemma  should 
be  considered  as  an  average  on  v.  We  have  H  =  2  and  we  let  v  ^  =  {i  +  t)/t,  -  1  +  t.  Then,  we 

have  for  the  expectation  in  Eq.  (87), 

it/ii+t) 


,  a-1 


cr  ifu  ,  v  ) 
o  \  r  r  / 


a=l 


\  (i+t)/t 

7 


(T  l(u  ,  V  ) 
o  \  r  '  r  / 


o  o' 


iin 


i/(i+t) 


(89) 


Here  the  average  is  first  carried  out  over  the  ensemble  of  codes  with  the  received  sequence  fixed 
and  then  over  the  received  sequence.  Final  arguments  in  this  section  are  concerned  with  evaluat¬ 
ing  and  bounding  these  two  terms. 

From  Eq.  (69),  we  have 


s  I 


i(e  V  )  =  Z  log 


P 


rh'  rh  ^ 


r=l  h=l 


2 


(90) 


where  are  the  h^^  digits  on  the  r^^  branch  of  v^,  0^  respectively,  each  of  s  branches. 

An  equivalent  statement  applies  [from  Eq.  (70)]  when  0^  is  replaced  by  the  correct  path  u^  . 

o 

Over  the  ensemble  of  codes,  digits  on  correct  and  incorrect  paths  are  statistically  independ¬ 
ent  and  identically  distributed  with  probability  assignment  {pj^}*  We  evaluate  the  second  factor 

in  Eq.  (89)  by  observing  that  I(u  ,  v  )  is  a  sum  of  statistically  independent  random  variables 

^o  ^o 

each  of  which  assumes  values 


45 


p  [yiAul 


Conditioned  upon  v,  each  of  these  rjt  random  variables  assumes  value 


log? 


p  [y^ 


with  probability 


^  p  [y/x  1 

Pr  [x^yj]  = 


(91) 


when  the  corresponding  received  digit  is  y^.  We  recall  that  1®  fho  probability  of  channel 
output  -ymbol  y^  when  input  symbols  are  assigned  probabilities  (pj^)-  For  the  second  factor  in 
Eq.  (89),  we  have 


a  I(u  ,  V  ) 
o  r  r 
,  o  o 


il+t 


i/(i+t) 


S  njji 

Lj-1 


K 


r  /  1  l+O" 

p  lyy^tii  o 


V  r  '•'j'  k 
^  Pk  [  f(y.' 

k=l  ^ 


1+t 


r^f/(l+t) 


(92) 


(93) 


where 


=  m  i°g2  ^<yj> 

j  =  l 


^  Pk[-f(r 

t=i  •* 


p  [y/xj!  i+CT, 


J  k- 

) 


i+t 


(94) 


Before  evaluating  the  first  factor  in  Eq.  (89),  let  us  observe  that  several  of  the  t  paths 
{©^,  .  .  .  ,  0^}  at  penetration  s  may  have  branches  in  common.  We  recall  that  in  the  previous 
section  we  identified  branches  on  the  paths  {O^,  .  .  .  ,  0^)  by  placing  a  1  above  each  (see  Fig.  15). 
Wc  then  defined  as  the  number  of  branches  at  length  r,  that  is,  the  number  of  I's  on  branches 
at  length  r.  Since  t,  a  branch  at  length  r  may  belong  to  more  than  one  of  the  t  terminal 

paths.  Let  6^  be  the  number  of  terminal  paths  containing  the  n^^  of  the  branches  at  length  r, 

1  n>$:  a^.  Since  the  total  number  of  terminal  paths  is  t,  wc  have 


Zj 

n=l 


6  =  t 

n 


(95) 


(The  dependence  of  6^  on  r  is  implicit.)  Call  this  n^^  branch  at  length  r  <p^  and  let  be  the 
h^^  digit  (of  f  digits)  on  this  branch.  Then,  in  Eq.  (89)  we  have 


t 


a=  1 


1(0^ 


V  =  2  Z 

r=l  n=l 


£ 

Z 

h  =  l 


logp 


^  f  Ahl 


(96) 
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Over  the  ensemble  of  codes,  the  tree  digits  are  statistically  independent  and  identically 
distributed  and  drawn  with  probabilities  (pj^)-  Since  is  a  digit  on  a  branch  in  the  incorrect 
subset,  it  is  statistically  independent  of  the  corresponding  transmitted  digit  and  of  the  corre- 

1  r 

spending  received  digit  Therefore,  sets  of  digits  .  .  .  ,  )  are  statistically 

independent  of  one  another  as  are  the  digits  in  each  set.  Digit  assumes  value  y^  with  prob¬ 
ability  given  by  Eq.  (64).  This  is  the  same  function  f{y^)  appearing  in  the  definitions  of  the 

metric.  The  conditional  expectation  in  the  first  term  in  Eq.  (98)  becomes: 


TTI 

,  a  =  l 


s  i 

_=  n  n 

V  r-1  h~l 


s  i 

n  n 

r=l  h=l 


O' 

r 

6 

n 

l+t 

log 

n  ; 

? 

n- 1 

r 

\  ^ 

r 

n 

E 

Pk[ 

.n  =  l 

lk=l 

P 


=  2  f(v  ,  ) 
rh 


6„/{l+t) 


(97) 


(98) 


But  the  digits  v^^^  are  statistically  independent;  hence,  the  first  term  in  Eq.  (89)  becomes,  with 
the  aid  of  Eq.  (98),  the  following: 


T  a  =  l 


,{i+t)/t 


t/{i+t) 


s 

=  n 

r  =  l 


E  n 

j=l  n=l 


K 

E 


k;  =  l 


Pk 


I  f(yj)  I 


{i+t)/nn/{i+t) 


(99) 


where  we  recognize  that  the  random  variables  in  the  square  brackets  of  Eq.  (98)  are  statistically 
dependent. 

The  above  probability  is  not  yet  in  usable  form.  As  the  first  of  two  steps  directed  at  putting 
it  in  usable  form  we  use  Holder's  Inequality  (Lemma  7)  on  Eq.  (99)  where  we  identify  w^^  with 


K 


E 


.  /  ,  6  /(l  +  t) 

fply/x^li  n 

Pkl~(y7“l 


{i+t)/t 


and  we  let  v, 
h 


t/6  .  We  note  that 
n 


E 


h  =  l 


n=l 


6  =  1 
n 


so  that  the  satisfy  the  necessary  constraint.  Then, 
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■“.■V 

a=l 


s  r 

^  n  n 


\  (l+t)/t '' 

’/ 

Z 

IK  ,  /  1  6  /(l+t)l 

y  „  rV>'l  " 

(l+t)/6nl 

^  Pk  [  f(y.)  j 

k=l 

- 

6^//(l+t) 


In  the  second  step  we  define 


J  /  K 

-  -j  E  (  E  PkP  lyj/" 

j  =  l  \k=l 


,l/(l+/3) 


)' 


+/3 


and  observe  that  terms  in  Eq.  (100)  can  be  rewritten  as  follows; 


J  /  K 


.  6  /(1+t) 

E  E  Pk  p  " 

j  =  l\k=l  / 


(l+t)/6. 


syd+t) 


R. 


where 


1  +  t  -  6 


(100) 


(101) 


(102) 


(103) 


We  note  that  /3  t  since  1  6  o  t. 

^  n  r 

Next,  we  deduce  from  the  following  lemma  that  +R^  for  /3  t  so  that  —  and 

Eq.  (102)  may  be  overbounded  by  replacing  R^  with  R^. 

Lemma  8. 

R^  as  defined  above  is  a  monotone  decreasing  function  of  /3  for  /3  ^0, 

Proof.  (See  the  Appendix.) 

Replacing  R^  with  R^  in  Eq.  (102)  and  inserting  this  result  into  the  inequality  of  Eq.  (100),  we 
have  the  following  final  bound: 


)  a=l 


.(l+t)/t 


t/(l+t) 


sitR, 

^TT 

;<  2  ^  ^  2  ^ 


(104) 


here  a  =  L  is  the  number  of  branches  on  the  set  of  paths  {0^,  .  .  .  ,  0^}  and  we  have  used 


r^l 


Eq.  (95).  Combining  Eqs.  (93)  and  (94)  in  Eq.  (89)  we  have  the  following: 


t  _ 

JT7  ^  !(©. 
a=l 


J_ 

1+t 


V  )  +  a 
s 


l(u  ,v  \ 
'  o  o' 


.<  2 


sftR^ 
"  1+t 


-otR 
>  t 


(105) 


where  K^(o^q)  and  R^  are  given  by  Eqs.  (95)  and  (101),  respectively. 

Our  last  step,  which  is  to  use  Eq.  (105)  in  Eq.  (87),  is  stated  formally  in  the  following 
theorem: 
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Theorem  4. 


The  probability  bounded  by  the  following,  where  a  is  the  number 

of  branches  on  the  tree  paths  of  length  s,  {g^,  .  .  .  ,  0^): 


. \ 


O  1+t  O  T  o  1+t  o 


X  2 


f^(R-R)  -am 
1  tt  t 


2  H  I  ^ 


-r  f[cr  R-p  (cr  )] 
0^0  t  o 


Vr  =1 
'  o 


where  o’  0, 
o 


J  /  K 

Rt  ~  ~  f  E  (  E  PkPiyj/^ 

j=i  \k=i 


,i/(i+t) 


) 


and 


m  i°g2  E  Ryj)^  E  Pk  f^yV’l 

j=i  \k=i  J 


r  /  1 ,  \l+t 

(y,/xi,]l  o\ 


(Eq.  (101)] 


(Eq.  (94)] 


(106) 


(We  shall  discuss  the  convergence  of  the  sum  in  Eq.  (106)  later.) 

This  is  the  result  at  which  this  section  has  been  directed.  We  have  obtained  a  bound  on  the 

probability  term  which  depends  on  the  paths  {g^,  .  .  .  ,  G^)  only  through  a ,  the  number  of  branches 

which  they  contain.  An  identical  proof  (which  we  do  not  include)  shows  that  the  probability  term 

corresponding  to  negative  values  of  i  differs  from  the  bound  above  only  in  the  sign  of  i  and  in 

the  value  of  a  (which  we  shall  call  cr, ). 

o  1 

The  following  section  combines  the  results  of  this  section  with  the  counting  arguments  of  the 
previous  section  to  obtain  the  complete  bound  on  the  moments  of  "static”  computation. 


E.  BOUND  ON  MOMENTS 

The  purpose  of  this  section  is  to  combine  the  results  of  the  two  previous  sections,  thereby 
bounding  the  moments  of  computation. 

From  Eq.  (82)  we  have 

(M(s)  \p  min[M(s),p]  s/ 

E  E  W(t,  p)  ^  N^(a)Q.^g(a)  (Eq.  (82)] 

m=l  /  t  =  l  a-s 

The  multiplicities  W(t,  p)  and  N^(q')  are  bounded  by  Lemmas  3  and  4  which  are  repeated  here  in 
abbreviated  form. 

Lemma  3. 


n/  Zirt  e  ^t^  >$:  W(t,  p)  >$:  t^ 


[Eq.  (79)] 
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Lemma  4. 


N^(a)  (t-  D!  (s  +  1)^"^  2®^^ 


(Eq.  (81)] 


The  lower  bound  to  the  function  W(t,  p)  was  introduced  in  order  to  establish  that  the  bound  in 
Eq.  (82)  must  grow  approximately  as  t^.  To  further  overbound  Eq.  (82)  we  overbound  min[IVl(s),  p] 
by  p.  Since  M(s)  =  (b  —  1)  b  for  s  ^  1,  it  grows  rapidly  with  s  and  the  minimum  will  equal  p 
for  most  values  of  s.  These  observations  lead  to  the  following  bound  on  Eq.  (82): 


( 


M(s) 


St 


2  ^  Tj  Tj  (t-l)I  (s  +  1)^'^  2"^^Q.^^(«)  . 


(107) 


m-1 


t  =  l 


We  are  now  prepared  to  use  the  results  of  the  preceding  section,  Theorem  4,  namely, 

+t  (7T7-ff  )  -it  (A+<t  )  (f^)(R.-R) 

f  \  ^  o  O  1+t  o  o  1+t  o  ^  1+t  t 
Q.  (o)  <  Z  2  2 

1,  s  ^ 


X  2 


E  2 


-r  fid  R-^JL(d  )] 
O  ^  O  I  o  ^ 


(Eq.  (106)1 


,r  =1 

o 


This  bound  and  that  given  above  yield 

P 


M(s) 


^  z.  ^(m)l  <  Yj  t^2  2  °'(t-l).'  (s  +  1)^"^ 


m=  1 


t=l 


X  2 


St 


E  2 


-ojf(R  -R) 


ia  =  s 


-r  1\g  R-p,(a  )] 
o  ‘  o  to' 


(108) 


In  the  previous  section  (Lemma  8)  we  discussed  R^  and  said  that  it  was  monotone  decreasing 

with  increasing  t.  If  we  choose  R  <  R  ,  then  R^  >  Rp  for  t  <  p  and  each  term  in  the  sum  on  a  is 

^  -sf(R^-R) 

less  than  1  and  each  is  overbounded  by  2  .  (We  note  that  this  largest  term  occurs  at 

O'  =  s  which  corresponds  to  the  case  where  the  paths  0^,  .  .  .  ,  are  one  and  the  same.)  Then, 


(M(s)  \p 

E 

ml  / 


<  2:  tP2 

t-i 


+Vrrt-o) 


^  t! (s  ^  1) 


t-1 


X  2 


-sf(R^-R) 

m 


-r^l%R-^"t(%)] 


(109) 


We  have  yet  to  discuss  whether  the  sum  on  r  above  converges  and  if  so,  for  what  values  of 
The  semi-invariant  moment  generating  function  is  given  by 


Ht(%)  =  log2 


\j=l  lk=l  '■  J  ^ 


i+Ai/d+t) 


.  |Eq.(94)] 
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Using  the  following  lemma,  we  find  that  m-  (ct  ).  Thus,  if  there  exists  a  cr  sueh  that 

^  *  "^t  o  p  o  o 

a  R  —  p  (a  )  >  0  then  a  R  —  p.  (cr  )  >  0  for  t  <  p. 
o  '^p  o  o  t  o  ^  ^ 

Lemma  9. 

Let  w  be  a  positive  random  variable  and  0  <  <  n.  Then 

.  (110) 

Proof.  (See  the  Appendix.) 

We  must  ascertain  whether  there  exists  a  a  <0  when  R  <  R  sueh  u  R  —  p  (cj  )  is  positive. 

o  p  o  "^p  o  ^ 

If  so,  the  sum  on  r  in  Eq.  (109)  eonverges.  The  next  lemma  will  aid  us  in  our  determination. 

Lemma  10. 

The  function  ct^^R  —  M-p(<^Q)  where  is  given  by  Eq.  (94)  is  positive  for  a'  0  where 

P^^(a’)/o''  =  R,  and  i®  monotone  increasing  in  cr^. 

Proof.  (See  the  Appendix.) 

We  deduce  from  the  monotonieity  of  M'p(^Q)/^Q  "^“P/(l  +  P)-  Therefore,  there  exists 

(j.  <— p/(l  +p),  (J  >— 4  such  that  a.R  —  p  (c7.)>0  and  a  R  —  p  (cr  )  >  0  when  R  <  R  .  We  shall 

I  Q  2  I  M-p\  Q  h-p\  q/  P 

need  these  results  soon. 

In  any  further  bounding  of  Eq.  (109)  we  must  consider  the  two  polarities  in  i,  namely  i  ^  0, 
i  ^  0.  We  bound  Eq.  (109)  over  the  two  ranges  of  the  index  i,  using  the  monotonieity  in  t/(l  +  t) 
(up),  in  R^  (down)  and  in  P^(crQ)  (up)  with  increasing  t. 

Theorem  5. 

For  i  ^0,  R  <  R  ,  and  a  >  cr' 
p  o 


'M(s) 

^  m=  1 


,1 


^  t  (l-(j  )  -It  (5  +  a  )  j 

o  o  T  o  2  o  ,  /  .  ^ \P-1  P 

2  pp  (s  +  1)^  p^ 


X  2 


-sf(R  -R) 
P 

Up 


^  -rffa  R-p  (cr  )] 

y  2  ^  p  o  ^ 


^r=  1 


-ito(l/2+%) 

For  i>$;  0,  replace  by  and  2  by 


(111) 


Proof. 

We  note  that  j  <  (t)/(l  +  t)  <  (p)/(l  +  p),  using  the  lower  bound  for  i  >0  and  the  upper  bound 
for  i  0. 

Theorem  5  is  now  employed  to  compute  the  sum  of  the  two  terms  in  Eq.  (75). 

Theorem  6. 

There  exists  a^,  0  such  that  the  following  is  bounded  for  R  <  R^: 
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E  E 

i=0  s=l 


M(s) 

E 


ll/p 


.  \  m  =  l 


iVp 


M(s) 
.  \m  =  l 


-si(R^-R), 

^ \  l/p 


$(  E  (s+l)'‘<^/P'  2  <1+P>P  j(2°  ppl) 

/  oo 


<s=0 


^  ^-ri[cr^R-Pp(cr^)]> 

^r=l  J 


1 2 


/V-o 

*(e 


(112) 


Proof. 


The  discussion  following  Lemma  10  indicates  that  for  R  <  there  exists  >  —j  and 

<T ^  <  — (p)/(l  +  p)  such  that  o-^R  —  Pp(o’Q)  >  ^  and  a^R  —  Pp(^i)  ^  These  first' two  conditions 

and  the  last  two  conditions  guarantee  convergence  of  the  i  and  r  summations,  respectively. 

Q.  E.  D. 

We  conclude  our  discussion  of  the  moments  with  the  following  theorem  which  summarizes 
the  results  of  the  last  three  sections.  We  recall  the  bound  Eq.  (75), 

l/p 

^l/p 


oo  oo 

c»".<  Z  E 

i=0  s=0 


M(s)  \p' 

E 

L\m=l  / 


YM(s)  \p 

l/p, 

1  E  z  (m)) 

\  ^  -1,  s  / 

\m=l  /  , 

[Eq.  (75)] 


Theorem  7. 


On  the  DMC,  the  p^^  moment  of  computation  with  the  Fano  Sequential  Decoding  Algorithm 

is  C^,  which  is  considered  as  an  average  over  the  ensemble  of  tree  codes,  and  is  finite  for 

R  <  R  where 
P 


J  /  K 


R  = 
P 


I  logz  E  (  E  PkP  [yjAl 


ii/d+p) 


i+p 


[Eq.  (101)] 


j  =  l  \k-l 

A  bound  to  is  obtained  by  combining  Eq.  (75)  with  Theorem  6. 


F.  COMPOSITE  BOUND  ON  DISTRIBUTION 

Our  concern  for  the  moments  of  computation  was  motivated  earlier  by  the  statement  that 
the  moments  may  be  used  with  a  form  of  Chebysheff's  Inequality  to  bound  the  distribution  of 
computation.  We  restate  Lemma  1. 
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Lemma  1. 


Let  C  be  a  positive  random  variable  with  moments  Then, 


Pr 


[Kq.  (73)] 


Sinee  the  moments  have  been  averaged  over  the  ensemble  of  all  tree  eodes,  we  have  a  bound 
on  the  distribution  eonsidered  as  an  average  over  the  ensemble  of  tree  eodes.  Indieate  this 
average  with  Pj^(C  ^  L). 

It  has  been  shown  above  that  is  finite  for  R  <  R  .  We  eannot  establish  the  exaet  behavior 

of  C'  from  our  arguments  sinee  has  been  overbounded.  Therefore,  we  shall  be  eontent  to 

eonsider  only  those  moments,  namely,  first,  seeond,  .  .  .  ,  sueh  that  R  <  R^.  To  avoid  con¬ 
fusion  let  k  indicate  an  arbitrary  order  of  moment  and  define  p  by  R  <  R^^  (note  that  R^ 

is  monotone  decreasing  in  increasing  p).  Therefore,  moments  of  order  k  p  converge  and  may 
be  used  in  bounding  (C  ^  Lj. 


Fig.  16.  Bound  on  distribution. 


Given  that  moments  of  order  k^  p  are  to  be  used  in  bounding  the  ensemble  average  of  the 

distribution  of  computation,  we  ask  for  that  order  of  moment  for  which  the  bound  is  smallest, 
th  —  k  i  /k 

If  the  k  order  moment  is  used  and  L  <  (C  )  '  ,  then  the  bound  on  the  distribution  is  greater 

-j^iA 

than  one,  so  that  one  must  be  used  as  a  bound.  Sinee  C  increases  with  k  (Lemma  7),  the 

bound  on  the  distribution  must  l^one  for  L  <  C  and  C/L  for  L  just  greater  than  C.  This  bound 

is  used  for  values  L  sueh  that  C  /L  exceeds  C/L.  The  point  of  intersection  of  these  two 

curves  occurs  at  L  =  C^/C  (see  ^ig.  16).  For  values  of  L  greater  than  this  value,  the  seeond- 

3  2 

order  moment  is  used  until  L  =  C  /C  at  which  point  the  third-order  moment  is  applied,  ete. 

In  general,  we  use  the  k  order  moment  for  C/C  ^L^(C  )/C.  The  composite  bound 

is  stated  below  (see  Fig.  17). 


Theorem  8. 

Let  C  be  the  random  variable  of  computation  with  moments  C  over  the  ensemble  of  tree 
eodes,  then,  for  k  p,  where  Rp^^  ^  R  <  Rp. 

L^  C 


Pr  [C  >L]^ 


c'^/L^ 


(113) 
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L 


Fig.  17.  Composite  bound  on  distribution. 


With  probability  equal  to  0.9  a  code  of  rate  R  chosen  at  random  from  the  ensemble  of  codes 
will  have  [C  10  [C  ^  L].  Codes  in  the  ensemble  are  assigned  probabilities  in  such 

a  way  that  digits  in  the  code  are  statistically  independent  and  identically  distributed  with  proba¬ 
bilities  {pj^)- 

Proof. 


The  bound  on  the  average  distribution  has  been  discussed  above.  The  second  statement 
follows  from  Markov’s  Inequality  (a  variant  of  Chebysheff’s  Inequality),  namely,  if  x  is  a  pos¬ 
itive  random  variable 

Ppj  [x^  ax]  =  1  -  [x  ^  ax]>$  1  - 


where  x  is  a  distribution  of  computation  and  a  =  10. 

The  composite  bound  is  the  lower  envelope  of  the  bounds  corresponding  to  the  individual 
moments.  For  large  L  (the  distribution  parameter)  the  distribution  behaves  as  L  ^  where  p 
is  the  largest  order  moment  which  is  guaranteed  to  converge.  That  is,  p  is  such  that 


P+1 


R  <  R  . 
P 
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CHAPTER  V 

INTERPRETATION  OF  RESULTS  AND  CONCLUSIONS 


This  report  is  motivated  by  a  concern  for  the  computational  requirements  of  the  Fano 
Sequential  Decoding  Algorithm  as  reflected  in  the  probability  of  a  buffer  overflow.  This  prob¬ 
ability  plays  a  central  role  in  the  design  of  the  Fano  decoder  for  two  reasons: 

(a)  The  probability  of  an  overflow  is  much  larger  than  the  probability  of  an 
undetected  error  (errors  without  overflow); 

(b)  When  overflows  occur  a  serious  break  in  the  decoding  process  results. 

Our  particular  concern  with  the  overflow  event  is  to  determine  its  sensitivity  to  the  storage 
capacity  of  the  decoder,  to  the  decoder's  speed  of  operation,  and  to  the  signaling  rate  of  the 
source.  Wc  have  had  to  approach  these  questions  indirectly  to  avoid  difficult  analytical  prob¬ 
lems.  Our  approach  has  been  to  consider  a  random  variable  of  computation  known  as  "static" 
computation  C.  We  have  over-  and  underbounded  the  probability  distribution  of  "static"  compu¬ 
tation,  fC  >  L],  and  have  shown  that  it  behaves  as  L  o  >  0,  for  large  L.  The  bounds  to 
^11  ^  L]  lead  to  bounds  on  o. 

We  shall  describe  an  experiment  performed  at  Lincoln  Laboratory  and  indicate  the  corre¬ 
lation  between  this  experiment  and  the  analytical  bounds  on  a.  This  will  lead  to  a  conjecture 
about  the  true  tail  behavior  of  [C  ^  L],  i.e.,  the  behavior  of  this  probability  for  large  L.  We 
shall  interpret  the  conjectured  exponent  a  in  terms  of  established  bounds  on  exponents  of  prob¬ 
abilities  of  error,  these  exponents  being  derived  from  coding  theorems. 

In  this  chapter,  wc  also  establish  a  heuristic  connection  between  the  probability  of  buffer 
overflow  and  the  distribution  of  "static"  computation  Pp^  [C  L].  From  this  connection  we  in¬ 
dicate  the  sensitivities  to  buffer  size,  machine  speed,  and  signaling  rate  which  are  displayed 
by  the  overflow  probability.  Finally,  we  introduce  and  discuss  several  research  problems. 

We  begin  this  chapter  with  a  discussion  of  the  tail  behavior  of  Pp^  [C  L). 

A.  COMPUTATION  EXPONENT 


In  Chapter  III,  a  lower  bound  applying  to  all  codes  was  found  for  Pj^  [C  :^L].  A  lower  bound 
for  codes  of  fixed  composition  was  also  found.  We  shall  be  concerned  here  only  with  the  general 
lower  bound. 

In  Chapter  IV,  an  overbound  to  Pj^  [C  !^L]  was  found  using  the  "random  code"  technique.  It 
was  shown  that  a  large  fraction  of  the  set  of  all  tree  codes  have  a  distribution  function  Pp^  [C  ^  L] 
which  is  less  than  some  fixed  multiple  of  the  ensemble  average  of  Pp^  [C  ^L]. 

It  was  indicated  by  Example  2  of  Chapter  IV  that  the  upper  bound  on  Pj^  [C  L]  of  that  chap¬ 
ter  should  be  numerically  weak.  Because  of  the  lower  bounding  technique  described  in  Chapter  III, 
the  same  may  be  said  for  the  lower  bound.  Example  2  did  indicate,  however,  that  the  behavior  of 
the  upper  bound  in  the  distribution  parameter  L  should  approximate  the  true  (ensemble  average) 
tail  behavior.  We  are  thus  motivated  to  consider  the  behavior  of  Pj^  [C  >  L]  with  L  for  large  L. 
To  study  this  behavior,  we  introduce  a  function  e(R)  called  the  computation  exponent. 


e(R)  ^  R 


lim 

L— oo 


/  -log2  [C  >L] 

I  log2  L 


(114) 
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***  ot 

Since  Pj^  [C  behaves  as  L  for  large  L,  the  exponent  a  is  related  to  the  computation  ex¬ 
ponent  e(R)  by  a  =  e(R)/R.  Multiplication  by  the  rate  R  normalizes  a  so  that  e(R)  is  a  bounded 
function. 

We  now  use  the  definition  of  Eq,  (114)  on  Theorems  3  and  8  to  obtain  upper  and  lower  bounds, 
respectively,  to  e(R),  We  note  that  e(R)  is  an  implicit  function  of  the  code,  since  [C  ^L]  is 
a  function  of  the  code. 

Theorem  9. 

On  the  completely  connected  DMC,  a  code  cannot  be  found  with  a  computation  exponent  ex¬ 
ceeding  e{R)  where 

e{R)  4  (-(7  )  (R  -  I  .  )  (115) 

—  o  min  '  ' 


and  a  is  the  solution  to 
o 


R  =  max 


1  ^ 
k  o 


for  -  1  X  a  <  0 
^  o  ^ 


[Eq.  (60)] 


Here,  is  given  by 


y|^(a)  ^  log^  Yi  P 
j=l 

,  A  .  ,  p[yiAkl 

^min  =  ""V"  f(y.) 


[Eq.  (38)] 


[Eq.(17)] 


and 


K 


f(yj)  =  Z  Pk  p  [yj/^ki  • 

k=l 


[Eq.  (64)1 


Theorem  10. 

On  the  general  DMC  there  exist  codes  with  computation  exponents  greater  than  or  equal  to 
e(R)  where 


e(R)  =  p  R 

for  R  XR<R  ,  p  =  1,2,3,..,,  and 

p+1  ^  P  '  i  ' 


(116) 


j=^ 


K 

y  r  /  ,l/(l+p) 

Z  PkPtyj/^ki 

k  =  l 


1+p 


[Eq.  (104)1 


The  probabilities  {pj^}  are  the  probabilities  assigned  to  letters  in  codes  in  the  ’’random  code” 
argument.  They  also  appear  implicitly  in  the  definition  of  the  path  metric  through  the  function 
f(y^.).  The  path  metric  on  the  path  terminated  by  node  (m,  s,  q)  of  the  q^^  incorrect  subset, 
d(m,  s,  q),  is  defined  as  by 


n  i 


d(m,  s,  q)  =  Z  Z 

r=l  h=l 


log,  - /  -  R 

2 


(117) 
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Here  u^,  n  =  q  +  s,  represents  the  given  tree  path;  represents  the  corresponding  section  of 
the  received  sequence;  and  ^rh  digits  on  the  r^^  branches  of  u^,  v^,  respectively. 

Theorems  9  and  10  delimit  the  tail  behavior  of  [C  ^L]  as  measured  with  the  computation 
exponent  e(R);  e(R)  ^  e(R)  for  all  codes  on  the  completely  connected  DMC,  and  there  exist  codes 
on  the  general  DMC  such  that  e(R)  :^e(R).  We  now  consider  the  behavior  of  the  two  bounds, 
e(R)  and  e(R),  with  the  signaling  rate  R. 

First  consider  e(R).  We  wish  to  show  that  it  is  a  monotone  decreasing  function  of  increasing 

R.  We  recall  from  the  discussion  of  Chapter  III  that  is  a  monotone  increasing  function 

of  cr^.  This  implies  that  R  =  max  also  monotone  increasing  in  Moreover, 

k 

7i  (o’  )/(T  is  continuous  in  a  as  is  R  =  max  [y,  {a  )/cr  ].  If  we  can  show  that  e(R)  =  (-a  )  (R  —  I  .  ) 
o''  o  o  ,  ^ '^k  o  '  '  '  '  o'  '  min 

k 

is  monotone  decreasing  in  increasing  a^,  we  will  have  established  that  e(R)  is  a  continuous  de¬ 
creasing  function  of  R.  The  monotonicity  of  (~o^)  (R  —  established  by  considering  its 

derivative  in  a  .  The  derivative  is  taken  at  a  value  of  cr  which  is  not  a  transition  point  of 
o  o 

max  [Tj^(^q)/^q]>  that  is,  a  point  at  which  the  index  which  achieves  the  maximum  is  changing 
k 

from  k  =  k.  to  k  =  k^. 

1  c. 


(-(J  )  (R  -  I  .  ) 
do-  o  min 

o 


d 

dcr^ 


CT  I 
o  min 


where 


—  I  yj  (ct  )  —  I 
*  'k^  o  min 


^k. 


2  p  fy ./x, 

j=i  1  J 


l  +  (j  -cr 


jf,  p  «)■]> 


(118) 


(119) 


We  may  underbound  each  of  the  log2  {p  ]/f(yj  )} ,  appearing  in  Eq.  (119),  by  the  smallest 

such  term.  By  definition,  this  must  exceed  I  .  .  Therefore,  yL  (cr  )  >I  and  (— cr  )(R—  I  .  ) 

min  o  min  o  min 

has  a  negative  first  derivative  at  values  of  (t^  which  are  not  transition  points.  Since 

(R  —  I  .  )  is  continuous  in  (7  ,  we  have  that  e(R)  is  continuous  and  monotone  decreasing  in  R. 
min  o'  ^ 

At  a  =  —  1R  =  0  and  e(R)  =  - 1  >0.  At  ct  =  0,  R  =  lim  max  [y,  (ct  )/ct  1  =  max  y,'  (0)  [since 

o'  min  o'  ^  ,  ^'k  o'  o^  ,  'k  ^ 

CT  —0  k  k 

o 

yk(0)  =  0]  and  e(R)  =  0.  These  results  are  summarized  in  the  following  lemma. 

Lemma  12. 


The  computation  exponent  upper  bound  e(R)  is  continuous  and  monotone  decreasing  in  in¬ 
creasing  R.  It  decreases  from  e(R)  =  —  I  at  R  =  0  to  e(R)  =  0  at  R  =  max  yJ^(O).  The  compu- 

k 

tation  exponent  bound  e(R)  is  sketched  in  Fig.  18  for  a  typical  channel  and  a  typical  probability 
assignment 

One  may  show  that  the  rate  at  which  e(R)  =  0,  namely  max  yJ  (0),  may  exceed  channel  ca- 

k  ^ 

pacity.  On  the  contrary,  if  the  assignment  {Pj^}  that  achieves  channel  capacity  is  used  then 
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Fig.  18.  Computation  exponent  upper 
bound  e(R). 


Fig.  19.  Computation  exponent  lower 
bound  e(R). 


max  71^(0)  =  C^.  We  recall  that  channel  capacity  is  defined  as  the  maximum  mutual  informa- 
k 

tion  between  channel  inputs  and  outputs.  Let  I(x,y)  be  the  mutual  information  between  channel 
inputs  and  outputs;  then. 


C  =  max  I(x,  y) 
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(120) 


It  has  been  shown  that  the  {pi  }  which  maximizes  I(x,y)  is  such  that 


Z  p  [yjAk'  ^“^2 
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p  [y/xj 
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(121) 


with  equality  when  p^^  0.  Therefore,  if  this  set  {pj^}  is  used  in  the  definition  of  l^(yj)»  is, 

in  the  definition  of  the  path  metric,  then  max  yMO)  =  C  and  the  rate  at  which  e(R)  =  0  is  channel 

k  K  o 

capacity. 

Wc  shall  now  consider  the  behavior  of  e{R)  with  R.  As  given  by  Theorem  10,  c(R)  =  pR  for 
Rp^^  R  <  Rp,  p  =  1,2,  .  .  .  .  Fix  p.  Then,  for  Rp^j  R  <  Rp,  e{R)  increases  with  R  on  a  line 
of  slope  p  passing  through  the  origin.  The  full  curve  e(R)  is  sketched  in  P'ig.  19.  F'or  R  arbi¬ 
trarily  close  to,  but  less  than  Rp,  e(R)  =  pRp.  We  now  show  that  the  points  pRp  form  an  in¬ 
creasing  sequence  for  increasing  p,  whereas  the  Rp  form  a  decreasing  sequence.  *This  will 
establish  that  the  sketch  of  Fig.  19  is  accurate. 

From  Lemma  8,  R  j3  ^  0,  is  monotone  decreasing  in  increasing  We  show  that  pR  is 
P  -pR  P 

monotone  increasing  in  p  by  showing  that  2  is  monotone  decreasing  in  p  for  a  fixed  set  of 

{Pk}- 


-pR 


p=  Z 

j=i 


K 


Z  Pk  p  [yjAki 


i/(i+p) 


k  =  l 


1  +p 


(122) 


Lemma  9  is  sufficient  to  establish  the  monotonicity  of  2 


-pR. 


We  repeat  this  lemma  here. 
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Lemma  9. 


l^et  w  be  a  positive  random  variable  and  let  0  <  77.  Then, 

(w  )  '  (w ^  ' 

Therefore,  if  we  apply  this  lemma  to  the  suih  over  k  for  each  j  in  Eq.  (122),  we  find  that  in- 

“  P^n 

creasing  p  decreases  2  ^  or  increases  pR  . 

We  now  show  that  on  the  completely  connected  DMC,  pRp  has  a  well-defined,  nonzero  limit 
as  p  —  00.  For  large  p, 

P  Inp  [yj/X|^]}  =^1  +  lnp[y^/X|^]  (123) 

and 


K  1/(1 +p) 
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^  exp 

(1  +  p)  In 
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(124) 


Therefore,  on  the  completely  connected  DMC,  as  p  becomes  indefinitely  large,  pRp  approaches 

K 

j  Z  p  log^ply /x^l 

1  V 

log^  2^  2 

j=l 


This  implies  that  =  pR^/p  approaches  zero  on  the  completely  connected  DMC.  When  the 
channel  is  not  completely  connected,  the  limit  of  pR^  as  p  may  be  infinite.  This  implies  that 
Rp  —  C^  >  0.  These  results  are  summarized  in  the  following  lemma. 

Lemma  13. 

The  computation  exponent  lower  bound  e(R)  is  a  set  of  straight  lines  of  increasing  slope, 
e(R)  =  pR  for  R-p^j  ^  R  <  Rp,  p  -  1,2,  3,  .  .  .  .  On  the  completely  connected  DMC  the  points 
pRp  increase  with  decreasing  R^  to  the  following  limits 


lim  R  =  0 

p 

p-^Qo  ^ 


J 

lim  pR  =  log2  Z 

P  — OO  ^  . 

J=1 


K 

2  p,log2P[y/x  ] 

k=l  J 


When  the  channel  is  not  completely  connected  lim 


Rp  =  C^  where  C^  may  be  strictly  positive. 


C  ■^  >  0. 

O 


The  largest  rate  for  which  e(R)  is  nonzero  is  R^.  For  R  ^  R^,  e(R)  is  zero.  It  will  be 
obvious  from  a  later  discussion  that  R^  capacity. 

As  an  example  of  the  computation  exponent  bounds,  we  show  in  Fig.  2  0  the  two  exponents 
e(R)  and  e(R)  for  the  binary  sjmimetric  channel  (BSC)  with  transition  probability  p^  =  0.01.  We 
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Fig.  20.  Bounds  on  the  computation  exponent  for  BSC  with  p^  =  0.  01 . 


Fig.  21.  Empirical  distribution  of  computation. 
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select  k  =  1,2.  Since  this  assignment  achieves  ehannel  eapaeity,  e(H)  =  0  at  R  -  C^.  For 

this  channel  and  the  given  assignment  {Pj^}>  we  have  e(R)  =  (p/l  +  p)  [R^  +  log^  (l/2p^)],  where 

R  =  Rp  and  p  assumes  all  values  greater  than  zero  (not  just  the  integers).  At  R  =  0,  e(R)  = 

—  I  =  log^  (l/2p  ). 
min  ^2  '  ^  ^  o 

In  the  next  seetion  we  eorrelate  the  analytieal  results  with  an  experiment. 


B.  AN  EXPERIMENTAL  RESULT 

1  3 

A  computer  simulation  of  the  Fano  algorithm  was  run  recently  at  Lincoln  Laboratory  under 
the  direction  of  K.  L.  Jordan  who  has  made  data  from  this  experiment  available  to  the  author. 
These  data  represent  slightly  more  than  one  million  decoded  digits  on  the  BSC  with  p^  =  0.01  and 
have  been  used  to  compute  an  experimental  distribution  of  computation  (see  Fig.  21).  The  compu¬ 
tation  variable  measured  in  this  simulation  will  be  discussed  shortly.  It  suffices  to  say  that  it 
differs  somewhat  from  "static”  computation. 

In  the  experiment,  a  convolutional  tree  code  of  the  type  described  in  Chapter  II  with  b  =  2 
was  used.  In  the  generator  g  =  (g|»g2>*  •  •  ^  ~  ^1  chosen  to  maximize  the  Hamming 

distance  between  the  two  tree  branches  at  the  first  node  of  the  tree.  Given  g^ ,  g^  is  chosen  to 
maximize  the  minimum  Hamming  distance  between  the  four  codewords  of  two  branches.  Several 
other  subgenerators  were  chosen  in  this  way.  The  remainder  were  chosen  at  random.  The  BSC 
was  simulated  with  a  random  number  generator  and  as  the  decoder  operated,  it  was  assumed  to 
have  an  infinite  buffer. 

The  computation  variable  recorded  by  the  computer  is  best  defined  with  the  aid  of  two  im¬ 
aginary  pointers.  We  may  visualize  a  pointer  "extreme"  below  the  tree  code  indicating  the 
furthest  penetration  into  the  tree  made  by  the  decoder.  Another  pointer,  "search,"  below  the 
tree  indicates  the  depth  of  the  node  presently  being  examined  by  the  decoder.  The  search  pointer 
either  lies  on  or  behind  the  extreme  pointer.  Every  time  the  two  pointers  move  ahead  together 
in  the  tree,  the  computer  program  records  one  computation.  If  a  search  is  required,  the  ex¬ 
treme  pointer  remains  fixed  and  the  program  records  the  number  of  operations  required  before 
the  search  pointer  returns  to  the  extreme  pointer  and  the  two  move  ahead.  The  data  from  the 
simulation  are  reduced  and  the  computer  program  prints  out  the  number  of  times  the  computation 
exceeds  2  for  k  =  0, 1, 2, .  .  .  .  In  the  particular  run  used  by  the  author  the  signaling  rate  R  was 
j  bit  per  channel  use.  The  largest  number  of  computations  in  this  run  was  less  than  2  56  and 
greater  than  128  and  it  was  observed  that  the  search  pointer  never  drifted  back  more  than  45 
branches  from  the  extreme  pointer. 

Although  the  computation  recorded  by  the  program  is  not  "static"  computation,  we  shall 
argue  later  that  it  is  a  small  multiple  of  "static"  computation.  Since  this  multiple  does  not  affect 
the  tail  behavior  of  the  experimental  distribution,  we  are  justified  in  computing  the  computation 
exponent  for  the  experimental  distribution  and  comparing  this  exponent  to  the  bounds  of  Fig.  20. 
The  experimental  point  is  shown  in  Fig.  20.  Other  computer  runs  at  rates  R  =  i  ,  i  were  re¬ 
corded  but  large  computations  were  so  infrequent  that  the  data  were  not  considered  reliable  and 
were  not  used. 

In  the  next  section,  we  conjecture  about  the  true  value  of  the  computation  exponent. 

C.  A  CONJECTURE 

We  are  led  to  conjecture  a  form  for  the  "true"  computation  exponent  by  consideration  of  the 
experimental  result  of  the  last  section  and  the  derivation  of  the  "random  code"  bound  on  the 
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distribution  of  "static”  computation.  In  the  discussion  of  this  bound  in  Chapter  IV,  we  limited 
attention  to  integral  moments  of  computation  for  analytical  reasons.  As  a  result  of  this  limita¬ 
tion,  e(R)  has  the  shape  of  Fig.  19.  We  now  suggest  that  the  true  "random  code”  computation 
exponent  has  the  form  e^  (R)  =  pR^  when  R  -  R^  for  all  p  0  (not  just  integer  p).  We  suggest 
that  this  is  an  exponent  which  may  be  achieved,  that  is,  that  codes  can  be  found  with  this  expo¬ 
nent.  (This  is  partially  substantiated  by  the  experimental  point  discussed  in  the  last  section. 

The  conjectured  ” random  code"  computation  exponent  and  this  point  differ  by  only  5  percent  at 
R  =  ^  for  the  BSC  example.)  Finally,  we  suggest  that  e’*'(R)  cannot  be  exceeded,  that  is,  that  no 
code  exists  with  a  computation  exponent  which  exceeds  e’!'(R).  These  suggestions  are  summa¬ 
rized  below. 

Conjecture 

The  computation  exponent  e*!^(R), 

e'!' (R)  =  pRp  ,  R  =  Rp  for  p  ^  0  (125) 

cannot  be  exceeded  by  any  code  used  with  the  path  metric  of  Eq.  (114)  and  codes  exist  which 
achieve  this  computation  exponent. 

The  eonjectifred  exponent  e’^'  (R)  is  a  monotone  decreasing  function  of  R.  This  may  be  de¬ 
duced  from  the  earlier  discussion  of  the  exponent  e(R).  The  value  of  e’*'  (R)  at  R  =  0  is  identical 
with  the  value  of  e(R)  at  R  =  0.  The  exponent  e’^(R)  is  zero  for  p  -  0  or  R  I(x,y)  where  I(x,y) 
is  given  by  Eq.  (120). 

The  conjectured  exponent  of  this  section  is  interpreted  in  the  following  section  in  terms  of 
"list  decoding"  exponents  and  the  "sphere-packing"  exponent. 

D.  INTERPRETATION  OF  COMPUTATION  EXPONENT 

The  conjectured  computation  exponent  e"^  (R)  has  a  simple  interpretation  in  terms  of  the 

"list  decoding  exponent,”  that  is,  the  exponent  of  the  "random  code"  bound  on  the  probability  of 

21  -23 

error  with  "list  decoding.” 

"ITst  decoding"  is  similar  to  maximum  a  posteriori  decoding.  We  assume  that  one  of 
M  =  2^  ^  equally  likely  codewords  is  transmitted  over  the  DMC.  Here  n  is  the  code  block  length 
in  channel  symbols  and  R  is  the  signaling  rate.  At  the  receiving  terminal,  the  decoder  makes 
a  list  of  the  k  a  posteriori  most  probable  codewords  given  the  received  channel  sequence.  If 
the  transmitted  codeword  is  not  in  this  list  of  k  codewords,  an  error  is  said  to  have  occurred. 
With  "list  decoding"  the  probability  of  error  is  reduced  from  the  probability  of  error  with  maxi¬ 
mum  a  posteriori  decoding,  k  =  1 ,  by  accepting  some  ambiguity  in  the  transmitted  message. 

The  probability  of  error  with  list  decoding  has  been  overbounded  using  a  "random  code" 
argument.  The  probability  of  error  is  averaged  over  the  ensemble  of  codes  by  assigning  to 
each  code  a  probability,  computed  as  if  each  letter  in  the  code  were  chosen  independently  with 
the  assignment  {p^},  the  assignment  of  Chapter  IV.  The  ensemble  average  of  the  probability  of 
error  with  list  size  k,  Pj^(€),  k  =  1,2,3,...,  has  the  following  bound 

-nE,  (R) 

Pl,(0^2  ^  (126) 

where 

E,  (R)  -  max  [pR  -  pR]  (127) 

0<p<k  P 
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The  exponent  E,  (R)  is  the  upper  envelope  of  the  straight  lines  pR  —  pR  for  all  0  ^  p^  k  (see 
k  P 

Fig.  22).  At  R  =  I{x,y),  Ej^(R)  -  0.  For  R^  Rg,  the  point  of  tangency  of  the  straight  line  of 
slope  —  k  to  the  curve  E^(R)  ^  lim  E^(R),  the  exponent  Ej^(R)  increases  along  a  straight  line  of 

k— oo 

slope  —  k  to  The  limiting  exponent  E^(R),  as  well  as  Ej^(R),  depends  on  the  probability 

assignment  {pi^}.  If  E  (R)  is  maximized  on  {pi^),  one  finds  that  the  resulting  exponent  equals 
the  "sphere-packing”  exponent.  This  latter  exponent  is  an  exponent  on  a  lower  bound  to  the 
probability  of  error  which  applies  to  every  block  decoding  procedure,  list  decoding  or  otherwise, 
and  as  such  the  "sphere-packing"  exponent  represents  the  largest  possible  exponent  on  the  prob¬ 
ability  of  error  with  any  block  decoding  procedure.  It  is  a  fundamental  bound  on  exponents  to  the 
probability  of  error. 

We  now  return  to  the  conjectured  computation  exponent  e'^'  (R).  A  simple  construction  on 
E^(R)  yields  e'-' (R)  (see  Fig.  23).  From  R  a  straight  line  tangent  to  E^(R)  is  drawn;  e'^' (R)  is 
the  height  of  the  intersection  with  the  exponent  axis.  This  straight  line  has  equation  pEp  —  pE 
for  some  p  by  definition  of  E^(R),  where  p  is  the  magnitude  of  the  slope  of  the  tangent  line. 

Although  the  conjectured  computation  exponent  [which  equals  c(R)  for  R  =  Rp,  p  =  1,2,  .  .  .  ] 
has  an  interpretation  in  terms  of  the  "list  decoding  exponent"  and  the  "sphere-packing"  exponent, 
there  is  no  obvious  connection  between  them.  Since  the  latter  two  exponents  are  fundamental  in 
a  sense,  the  fact  that  the  conjectured  exponent  is  interpreted  from  them  suggests  that  this  expo¬ 
nent  may  also  be  fundamental.  Unfortunately,  there  is  no  other  evidence  to  suggest  that  this  is 
the  case. 

E.  OVERFLOW  QUESTION 

In  this  section,  we  establish  a  heuristic  connection  between  the  probability  distribution  of 
"static"  computation,  which  we  have  studied  extensively,  and  the  probability  of  buffer  overflow. 
Our  discussion  will  indicate  the  sensitivity  of  the  overflow  probability  to  signaling  rate  R  to 
machine  speed,  to  buffer  size  and  to  the  number  of  digits  decoded  before  overflow.  We  begin  by 
summarizing  the  discussion  of  Chapter  II  on  the  overflow  event. 

We  assume  that  the  Fano  decoder  operates  with  the  buffer  shown  in  Fig.  24.  Branches  arrive 
from  the  channel  and  are  inserted  at  the  left-hand  end  of  the  buffer.  They  move  through  the 
buffer  at  the  rate  at  which  they  arrive  and  are  released  when  they  reach  the  right-hand  side  of 
the  buffer.  Below  each  branch,  space  is  provided  to  record  tentative  decisions  on  the  source 
digits.  This  portion  of  the  buffer  is  empty  to  the  left  of  the  pointer  "search." 

As  the  decoder  proceeds,  it  inserts  or  erases  tentative  source  decisions  recorded  below  the 
tree  branches.  These  insertions  or  erasures  occur  at  the  search  pointer  because  this  pointer 
indicates  the  received  tree  branch  presently  being  examined  by  the  machine.  The  pointer  "ex¬ 
treme"  indicates  the  latest  received  tree  branch  examined  to  date.  Branches  to  the  left  of  this 
pointer  have  never  been  compared  to  branches  in  the  tree  code. 

The  search  and  extreme  pointers  hover  near  the  left-hand  side  of  the  buffer  when  the  decoder 
has  little  trouble  decoding.  Occasionally,  however,  an  interval  of  high  channel  noise  forces  a 
large  amount  of  computation  and  the  two  pointers  drift  to  the  far  right  end  of  the  buffer.  When 
this  happens,  there  is  a  high  probability  that  an  erroneous  digit  will  be  released  into  the  safety 
zone.  Since  the  decoder  is  unable  to  change  digits  in  the  safety  zone  (the  corresponding  received 


t  The  exponent  is  defined  as  lim  ( —  log2P(€))/n. 

n-^oo 
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Fig.  22.  List  decoding  exponent. 


Fig.  23.  Construction  for  e*(R). 
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Fig.  24.  Buffer. 
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branches  have  been  discarded),  the  decoder  is  forced  to  consider  extending  on  incorrect  paths. 
This  is  very  difficult,  so  that  thereafter  both  pointers  tend  to  hover  near  the  far  end  of  the  buffer, 
releasing  erroneous  digits.  Although  overflow  can  be  detected,  it  is  a  serious  disturbance  and 
must  be  combated  either  with  the  use  of  a  feedback  channel  or  periodic  resynchronization  or  by 
some  other  means.  We  will  attempt  to  estimate  the  sensitivity  of  the  overflow  probability  to  the 
system  parameters. 

Now  that  we  understand  the  meaning  of  overflow,  we  return  to  a  consideration  of  "static” 
computation.  Our  intention  is  to  lay  the  groundwork  for  a  discussion  of  P^p(N),  the  probability 
of  a  buffer  overflow  on  or  before  the  time  at  which  the  source  decision  enters  the  safety  zone. 

Consider  the  node  on  the  correct  path  (1,  0,  q).  "Static"  computation  associated  with 
q  correct  node  is  defined  as  the  computation  eventually  performed  with  the  Fano  algorithm  on 
nodes  of  the  q  incorrect  subset  when  the  correct  message  is  ultimately  decoded.  We  now  argue 
that  whatever  computation  is  performed  in  this  incorrect  subset  is  performed  on  nodes  which  are 
close  to  the  reference  node  (1,  0,  q)  and  that  almost  all  of  these  computations  are  performed 
together  in  time  rather  than  a  substantial  fraction  now  and  a  comparable  fraction  later.  We  are 
in  effect  going  to  argue  that  "static"  computation  is  very  closely  related  to  "dynamic"  computa¬ 
tion.  The  argument  is  as  follows: 

(1)  For  a  properly  chosen  code  and  for  a  reasonable  range  of  signaling  rates, 

R  <  R|,  computation  in  an  incorrect  subset  is  due  almost  completely  to  an 
interval  of  high  channel  noise  and  a  concomitant  dip  in  the  correct  path. 

We  argue  that  this  is  true  by  noting  that  if  the  correct  path  does  not  dip, 
the  decoder  will  never  be  searching  far  from  the  correct  path. 

(2)  Let  W  be  the  width  of  a  dip  in  the  correct  path  (the  separation  between 
points  A  and  B  in  Fig.  25).  Let  the  magnitude  of  the  dip  remain  fixed. 

Then  it  can  be  shown  that  a  dip  of  width  W  occurs  with  a  probability 
which  decreases  exponentially  fast  in  W.  Therefore,  this  width  will 
typically  be  small. 

(3)  If  the  q^^  correct  node  (1,  0,  q)  is  in  the  region  of  a  dip  in  the  correct 
path  (see  Fig.  25),  then  paths  in  the  associated  incorrect  subset  may  be 
above  the  minimum  of  the  dip  over  the  region  A  to  B  of  Fig.  25,  but 
beyond  B  they  will  typically  fall  rapidly  below  the  dip  minimum  never 
to  be  extended. 

(4)  It  is  conceivable  that  a  dip  far  ahead  of  a  particular  correct  node  will 
force  a  return  to  the  incorrect  subset  associated  with  this  node.  The 
probability  of  such  an  event  is  very  small  as  is  seen  from  the  following 
observations:  Typically,  the  correct  path  will  rise  from  a  particular 


Fig.  25.  Typical  correct  path  trajectory. 
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correct  node  [see  (1,  0,  q’)  of  Fig.  25].  If  a  later  dip  in  the  correct  path 
is  to  force  a  return  to  node  (1,  0,  q’),  this  dip  will  have  to  equal  or  ex¬ 
ceed  the  rise  which  previously  occurred  in  the  correct  path.  If  such  a 
dip  occurs  far  in  the  future,  it  will  typically  be  very  large  in  magnitude. 
Such  an  event  is  very  unlikely.  It  occurs  with  a  probability  which  de¬ 
creases  exponentially  in  the  magnitude  of  the  dip.^^ 

(5)  Thus,  if  computation  is  required  in  the  q^^  incorrect  subset,  with  high 
probability  it  will  be  due  to  a  dip  in  the  correct  path  which  is  close  to 
the  qth  correct  node.  Since  the  width  of  the  dip  will  typically  be  small, 
all  the  computation  performed  in  the  q^^  incorrect  subset  is  usually  per¬ 
formed  on  nodes  close  to  the  q^^  correct  node.  The  behavior  of  the  prob¬ 
abilities  mentioned  in  (2)  and  (4)  can  be  established  with  a  "random  code" 
argument. 


Statement  5  summarizes  the  argument  which  suggests  that  "static"  computation  is  related 
to  "dynamic"  computation.  We  note  that  the  "static"  computations  in  the  adjacent  incorrect  sub¬ 
sets,  which  are  located  within  the  region  of  a  correct  path  dip  (C  to  B  in  Fig.  25)  will  be  com¬ 
parable  so  that  the  total  "dynamic"  computation  due  to  the  dip  will  be  a  small  multiple,  say 
^avg'  of  tbe  "static"  computation  in  one  incorrect  subset.  We  also  note  the  pointers  "search" 
and  "extreme"  indicated  in  the  buffer  description  may  also  be  applied  to  the  path  trajectories  of 
Fig.  25.  As  a  result  of  a  correct  path  dip,  the  extreme  pointer  will  move  out  to  point  B  and  will 
typically  remairf  there  until  the  running  threshold  has  been  reduced  sufficiently  to  pass  the  correct 
path.  It  is  this  argument  which  justifies  our  comparing  the  computation  exponent  bounds  to  the 
data  taken  from  the  Lincoln  Laboratory  simulation.  We  may  also  observe  from  the  discussion 
of  Chapter  III  that  the  computation  increases  exponentially  with  the  width  of  the  correct  path  dip 
so  that  for  a  dip  which  causes  a  large  computation,  the  extreme  pointer  of  Fig.  24  will  drift  back 
by  an  amount  x  while  the  extreme  and  search  pointers  will  have  a  separation  proportional  to 
log  X.  We  are  now  prepared  to  discuss  the  overflow  probability. 

The  buffer  overflow  probability  Pgp(N)  is  defined  as  the  probability  that  overflow  occurs  on 
or  before  the  time  at  which  the  source  decision  reaches  the  safety  zone.  It  certainly  exceeds 
Pbf^^^*  that  is, 

Pbp(N)  ^Pbp(I)  .  (128) 

First,  we  shall  consider  Pgp(l)  order  to  bring  out  the  dependence  of  Pgp(N)  on  signaling  rate 
R,  machine  speed,  and  buffer  size. 

PpF^l)  is  the  probability  that  the  buffer  overflows  on  or  before  the  time  at  which  the  first 
source  decision  reaches  the  safety  zone.  Since  the  buffer  is  empty  before  the  first  received 
branch  enters  the  buffer,  overflow  can  occur  if  computation  in  the  first  incorrect  subset  and  ad¬ 
jacent  subsets  is  sufficient  to  force  the  search  pointer  from  the  left-  to  the  right-hand  side  of  the 
buffer.  Large  computation  in  these  subsets  (let  there  be  them)  is  due  to  a  local  dip  in 

the  correct  path  so  that  if  the  total  "static"  computation  over  these  incorrect  subsets  ex¬ 

ceeds  L^,  where  L^  is  the  number  of  computations  needed  to  force  the  search  pointer  to  the  far 
end  of  the  buffer,  then  overflow  occurs.  If  T  ,  is  the  time  between  branch  arrivals  and  B  is  the 
number  of  branches  which  may  be  stored  in  the  buffer,  then  it  takes  seconds  to  fill  the 

buffer.  We  neglect  the  distance  between  the  search  and  extreme  pointers  and  assume  that  each 
computation  requires  seconds.  Then  if  L^  =  or  more  computations  are  required 

in  the  first  N  incorrect  subsets,  then  overflow  will  result.  If  the  computation  in  these  sub- 
avg 

sets  is  comparable,  and  if  the  "static"  computation  in  each  one  of  them  exceeds  P'^c^'^^avg^m" 
overflow  occurs.  Therefore, 
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R 


BT 


C  - 


ch 


N  y 
avg'^m 


(129) 


We  may  deduce  from  the  fact  that  [C  L]  behaves  as  e(R)]/R^  large  L,  where  e{R) 
is  the  computation  exponent,  that  relatively  insensitive  to  a  change  in  B,  the  storage 

capacity  of  the  buffer,  or  to  a  change  in  y^,  the  time  for  one  machine  computation. 
very  sensitive  to  signaling  rate,  however,  because  the  exponent  [e(R)]/R  increases  rapidly  with 
a  decrease  in  rate.  These  arc  the  sensitivities  mentioned  in  Chapter  II.  Let  us  now  consider 
the  sensitivity  of  to  N. 

It  should  be  clear  that  l^j:^p(N)  will  increase  rapidly  to  one  with  N,  the  number  of  source 
decisions  released  into  the  safety  zone,  if  the  average  number  of  decoding  operations  required 
by  the  Fano  algorithm  exceeds  the  number  of  computations  per  second  which  the  decoder  can 
perform.  We  find  from  inspection  of  the  conjectured  computation  exponent  that  the  average  com¬ 
putation  required  by  the  algorithm  is  very  large  if  R  ^R^.  Therefore,  Pj3p(N)  must  grow  rapidly 
to  one  with  N  for  R  ^R|.  This  then  is  an  upper  limit  to  the  rate  at  which  the  Fano  algorithm 
may  operate  with  infrequent  overflows.  It  has  been  shown  that  the  average  computation  is  small 
if  R  ^  0.9  R|,  being  several  computations  per  decoded  digit.  Thus,  if  the  machirle  speed  is  such 
that  several  times  this  number  of  computations  per  second  can  be  performed,  then  we  do  not  ex¬ 
pect  l^j3p(N)  to  grow  rapidly  with  N.  In  fact,  one  may  reasonably  argue  that  decreasing  the  sig¬ 
naling  rate  rapidly  decreases  the  probability  of  frequent  intervals  of  large  "dynamic"  computa¬ 
tion,  and  this  implies  that  with  a  reduction  in  signaling  rate  the  machihe  decodes  easily  and  both 
the  search  and  extreme  pointers  hover  near  the  left-hand  end  of  the  buffer.  If  large  computations 
are  infrequent,  we  expect  only  one  burst  of  computation  at  a  time,  which  is  to  say,  that  bursts 
will  be  statistically  independent.  ^j3jr(^)  then  is  proportional  to  N  and  that  is, 

Pj^j,(N)  -NPgj,(l)  (130) 

when  R^  0.9  R^,  Pj^p(l)  is  small,  and  the  machine  speed  exceeds  by  several  times  the  speed 
required  to  handle  the  average  computation. 

While  the  statements  of  this  section  are  strictly  heuristic,  there  is  good  reason  to  believe 
Eq.  (129)  because  of  the  experimental  result  cited  above.  The  statement  of  Eq.  (130)  is  less 
secure  than  that  of  Eq.  (129).  At  best,  it  may  serve  as  a  guideline. 

This  completes  the  discussion  of  overflow  probability. 


F.  SOME  RESEARCH  PROBLEMS 

We  conclude  this  chapter  with  a  discussion  of  some  problems  suggested  by  the  results  of 
this  report.  We  shall  discuss  these  suggested  problems  in  inverse  order  of  importance. 

The  distribution  of  "static"  computation  and  the  probability  of  buffer  overlow  were  loosely 
connected  in  the  previous  section.  It  is  unfortunate  that  the  connection  had  to  be  heuristic. 
Perhaps  a  more  direct  connection  is  possible. 

If  a  direct,  nonheuristic,  approach  to  the  probability  of  buffer  overflow  cannot  be  found, 
then  the  heuristic  approach  of  the  last  section  should  be  improved  by  improving  the  bounds  on 
the  distribution  of  "static"  computation.  In  particular,  there  is  reason  to  believe  that  a  stronger 
lower  bound  argument  than  that  presented  in  Chapter  III  may  be  found  and  that  such  a  bound  would 
not  require  the  assumption  that  the  DMC  is  completely  connected. 
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A  more  important  problem  than  the  two  suggested,  concerns  the  choice  of  a  path  metric. 

The  metric  assumed  for  this  report,  Eq.  (117),  requires  exact  knowledge  of  the  channel  transi¬ 
tion  probabilities.  There  are  several  reasons  for  not  using  a  metric  of  this  type. 

(1)  It  may  be  too  difficult  to  measure  the  channel  transition  probabilities; 

(Z)  The  channel  may  be  time  varying  so  that  a  metric  for  the  poorest  channel 
state  may  be  necessary; 

(3)  The  channel  transition  probabilities  may  be  known  but  they  may  be  either 
so  large  in  number  or  sufficiently  difficult  to  compute  in  the  decoder  that 
some  other  metric  is  desirable. 

Thus,  there  is  a  need  to  consider  the  performance  of  the  Fano  Sequential  Decoding  algorithm 

with  a  variety  of  metrics.  If  we  choose  to  measure  the  performance  of  the  algorithm  with  the 

computation  exponent,  an  analytical  treatment  of  the  various  metrics  may  be  possible  using  the 

technique  of  Chapter  III.  It  is  not  expected  that  the  "random  code"  argument  will  carry  through 

for  many  different  metrics.  It  is  more  reasonable  to  expect,  however,  that  a  fruitful  study  of 

the  effect  of  a  change  in  metric  on  the  Fano  algorithm  will  be  achieved  through  simulation.  A 

13 

preliminary  study  of  this  type  has  been  completed  at  Lincoln  Laboratory.  The  behavior  of  the 
Fano  algorithm  appears  to  be  insensitive  to  a  variation  in  metric. 

We  come  now  to  the  most  important  problem  area  suggested  by  this  report,  that  of  overflow. 
Since  it  occurs  with  a  much  larger  probability  than  do  undetected  decoding  errors,  it  deserves 
further  examination.  In  our  study  of  the  overflow  probability  P3p(N)  we  have  found  that  it  is 
insensitive  to  buffer  size  and  machine  speed,  but  strongly  dependent  on  signaling  rate.  This 
suggests  that  a  sizable  decrease  in  is  obtainable  only  with  a  decrease  in  rate.  For  many 

applications,  large  signaling  rate  is  desired.  Hence,  if  could  be  made  to  decrease  more 

rapidly  with  buffer  size  and  machine  speed,  then  the  decoder  could  operate  at  a  higher  rate  with 
an  equal  overflow  probability.  We  are  motivated  then  to  consider  ways  of  reducing  the  size  of 
the  "static"  computation  for  each  channel  noise  sequence.  As  mentioned  in  Chapter  III  for  Se¬ 
quential  Decoding  there  exists  some  high  channel  noise  sequence  such  that  "static"  computation 
is  large  and  growing  exponentially  with  the  length  of  this  interval  of  high  channel  noise.  If  the 
rate  of  growth  of  computation  with  such  a  channel  noise  sequence  is  reduced,  then  Pt^t:7(N)  will 

rS  r 

decrease  more  rapidly  with  buffer  size  and  machine  speed. 

Conceivably,  a  reduction  in  the  rate  of  growth  of  computation  with  channel  noise  is  possible 
by  modifying  the  Fano  algorithm.  If  the  rate  of  growth  of  computation  with  a  modified  algorithm 
remains  exponential,  then  the  modified  algorithm  should  be  expected  to  be  similar  in  design  and 
performance  to  the  Fano  algorithm.  If  the  rate  of  growth  realized  is  nonexponential,  it  is  doubtful 
that  the  modified  algorithm  will  resemble  the  Fano  algorithm  in  any  way.  Exponential  growth  of 
computation  seems  to  be  characteristic  of  this  algorithm. 

If  the  rate  of  growth  of  computation  is  to  be  nonexponential,  there  is  some  question  that  the 

probability  of  error  can  be  made  to  decrease  with  the  constraint  length  of  the  code  S  as  fast  as 

S  Ei  ( )  T 

Z  ,  as  it  does  for  Sequential  Decoding  algorithms.  As  a  matter  of  fact,  there  arc  a  number 

of  decoding  procedures  for  which  the  computation  is  bounded  by  a  function  which  is  algebraic  in 

the  constraint  length  or  block  length  S,  that  is,  which  grows  no  faster  than  for  some  ^  0; 

—  E(R) 

but  at  the  same  time  the  error  probability  decreases  only  as  Z”  ^  ,  where  c  is  some  num- 

3  9  10 

ber  strictly  greater  than  zero.  '  '  There  seems  to  be  an  important  sacrifice  in  error  prob¬ 
ability  for  a  reduction  in  computation.  Since  a  small  error  probability  can  be  realized  with 
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small  cost,  a  trade-off  of  this  type  may  be  desirable.  We  are  prompted  to  suggest  that  the 
obtainable  trade-off  between  computation  and  error  probability  is  limited  by  the  channel  and  the 
signaling  rate.  If  such  a  trade-off  exists,  the  knowledge  of  the  best  balance  between  computa¬ 
tion  and  error  probability  would  be  of  great  conceptual,  and  ultimately,  practical  interest. 

Note  added  in  proof:  In  a  recent  paper  to  be  published,  I.  Jacobs  and  E.  Beriekamp  through 
a  direct  argument  have  underbounded  the  probability  of  a  buffer  overflow  or  an  undetected  error. 
This  bound  grows  linearly  with  the  number  of  information  digits  processed  by  the  decoder  and  it 
has  as  computation  exponent  that  given  by  the  conjecture  of  this  chapter. 

Also,  H.  Yudkin  has  recently  shown  that  the  random  code  bound  of  Chapter  4  can  be  refined 
so  that  the  lower  bound  to  the  computation  exponent  agrees  with  the  conjectured  exponent  for 
rates  less  than  R 

comp 
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APPENDIX 

LEMMAS 


Lemma  2.  (Minkowski’s  Inequality) 

Let  1  <  h  <  H  be  a  set  of  positive  random  variables.  Then, 


/  H  vf 

[I 

'h=l  ' 


1/p  H  _ 

<  E  ,  p>i 


h=l 


Proof. 

Holder's  inequality  established  below,  will  be  used.  Write 


s  =  w^  +  .  .  .  +  Wjj 


and  let  =  s^.  Using  Holder's  Inequality  for  two  variates  with  =  p  and  v^~  p/{p  —  1) 
H  H 

5P=  E  W.  sP-'  <  ^  {^)'/P  {IP)'-'/P  . 


h=l 


Then. 


H 


S  '<1 


l/p 


h  =  l 


h=l 


H 

E  w, 

h=l 


l/p  H 


S  iwf) 


l/p 


Q 


h  =  l 


Lemma  7.  (Holder’s  Inequality) 

Let  1  <  h  <  H  be  a  set  of  positive  random  variables  and  let  {^'^),  1  <  h  ^  H  be 

of  positive  numbers  satisfying 


H 


E  -  =  1 

u  u 
h=l  ^ 


Then, 


H 


H 


n  w^<  n 

h=l  h=l 

Proof. 

It  suffices  to  establish  that 

(7)'/'’  a,b  ^0 


we  have 


.  E.  D. 


a  set 
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when  {i/u)  +  (l/r])  =  1  since  this  inequality  may  be  iterated  to  obtain  the  inequality  of  the  lemma. 
Let  the  joint  probability  that  a  =  a^  and  b  =  b^  be  p^.  Then, 


ab  =  S  Pi^ibi 
i 

Let  e(t)  =  -{i/v)  t  for  t  >  0.  Then, 


G(t)  =  ^(t 


>0  0  <  t  <  1 


=  0  t  =  1 

<0  t  >  1 

Therefore,  0(t)  achieves  a  maximum  at  t  =  1  over  the  range  t  >  0.  Hence, 

e(t)<  0(1)  = 

Let  t  =  A/B  and  multiply  by  B  where  both  A  and  B  are  positive  to  obtain  the  following 

gl/j)  ^1^+13 

P'  7] 

Now,  choose 


A  = 


Pj^ 

2  p.a.^ 

i 


B  = 


p.b. 


1  1 


2  p.b.'' 
.  1 
1 


Replacing  A  and  B  by  their  values  and  summing  on  i,  we  arrive  at  the  desired  inequality, 
namely, 

tlA-  /„  „\l/r) 


Q.  E.  D. 


Lemma  8. 


As  defined  below,  is  a  monotone  decreasing  function  of  increasing  /3  for  /3  ^  0. 


J  ,  K  >  1+/3 

=  Z  (e  PkP 

j=l  \k=l  / 


Proof. 


Let  E(/3)  =  Then, 

dR 


^  ±  ^  /1E'(/1)  -  E(/l) 


At  /3  =  0  the  numerator  is  zero.  Its  derivative  is  /3E"(/3).  We  show  below  that  E'’(^)  <  0;  hence, 
the  numerator  is  negative  for  /?  >  0  as  is  the  derivative  of  R^. 

To  show  that  E”(/?)  <  0  we  shall  demonstrate  that  E(/3)  is  a  convex  upward  function. 
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E(^)=-log2  Tj  (Tj  PkP 

j=l  'k=l 

Holder’s  Inequality  for  the  two  variate  case  will  be  used  twice.  We  apply  it  to  the  inner  sum 
above  with 


1  +  ^  _  1  +  ^ 

‘'l  x(l  +  ’  ^2  (1  -X)  (1  +  /S^) 


where  0  <  X  <  1,  p ^  ^  ^  “  ^^1  +  (1  “  ^2 


K 


Z  PkPtyj/^ki 

k=l 


1/(1+^) 


PkP  tyj/^k' 


i/Mi+P^) 


Mi+P.) 

) 


K 

Z 

k=l 


PkP  [yjAk^ 


l/[(l-X)(l+^,)] 


(l-X)(l+^2) 


Applying  Holder's  Inequality  to  the  double  sum  in  the  definition  of  E(^)  with  =  l/x,  i’ 2  ~  f/(l— A) 
we  have 


J  ,  K 


r  J  y  K 


z  (  Z  PkP[yAi^/‘"'T'<  s  PkPtyAi' 

3=1  \k=l  '  Lj=l  'k=l  ' 


,  1+^1-.^ 
iA(i+^i)\ 


J  ,  K 


Z  (  h  PkP[yA  ) 

1  =  1  ^k=l  ' 


1+^2  1-X 


The  inequality  is  strengthened  if  the  exponents  of  p  [y./x,  ]  are  replaced  by  l/{l  +  /3.).  Then, 

3  ^  1 


E  +  {1  -X)  >XE{/3^)  +  {1  -X) 


which  establishes  that  E”{^)  ^  0.  Q.  E.  D. 

Lemma  9. 

Let  w  be  a  positive  random  variable  and  0  <  <  t]  ,  Then, 


Proof. 


Let  w  =  Wj^  with  probability  p.,  then. 


We  have 


73 


where 


d  I  i^A/v  1  ,  v.\/v 

—  (W  )  '  = - t^xr  \  / 


dv 


I  In  p.wH  f  i  2  p.w.*'  Inw. 


=  -h  (Z  Pi^i  l"Qi) 


w. 

1 


and 


^  S  p.w.^ 
.  ^1  1 
1 


Z  Pi^i  =  ^  • 


>0 


Using  the  standard  inequality  Inx  ^  1  —  (l/x),  the  derivative  is  lower  bounded  by 


^  <w  )  '  ^  ^  (w  )  ^ 


2  -  i) 


=  0 


Q.  K.  D. 


Lemma  10. 


The  function  a  R  —  u  (a  )  where  ii  (a  )  is  given  by 
o  p  o  p  O  ^  '' 


=  rh  ^°g2  Z  f(yi) 


P  o 


j=l 


r  K  /  r  /  i\l+a 

V  /P  'yA5\  ° 
Z  Pk(-Vy-J-) 

k=l  '  ^  ' 


is  positive  for  a'  <  a  <0  where  a’  is  such  that  u  {a') /a'  =  R,  and  u  (a  )/o-  is  monotone  in- 
^  o  p  p  o'  o 

creasing  in  a^. 

Proof. 

For  a  R  —  u  (a  )  to  be  positive  we  must  have  R<  u  (a  )/(j  since  a  <  0.  If  u  (a  )/(j  is 
o  ^p  o  ^  ^p  o  '  o  o  ^p  o  '  o 

monotone  increasing  in  the  desired  result  is  established.  We  shall  now  show  that  such  is 
true.  The  derivative 

d  ^p^^o^  “  ^p*^o^ 

d(T  a  ^  2 

o  o  a 

o 

is  positive  if  the  numerator  is  positive.  Since  the  numerator  is  zero  for  =  0  it  suffices  to 

show  that  its  derivative,  a  ),  is  negative  for  a  0  or  u”( o’  )  ^  0. 

opo  o  po 

Let  =  p  [y^/xj^]/f(y  J.  Then, 


J  /  K  1+a  /  K  1+a  v 

I  i>Si  ° 


TogTe 


P  o 
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also. 


jJi'Ho’  )  ^ 

o  


TogTe 
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A  °)  (a 
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Pk^jk  °^"^jk 


(l+p)Pp((T^) 


J  /  K  1+cr  \p  / 


1+a  \p  /  K  l+(7 

r\  I  r 


r  J 


— (1  +  p) 


(l+p)Pp(a^) 


K  1+a  \p  /  K  1+a 

r\\^  i 


2  p,a„  ^Ina 


k=l 


k  jk  jk 


-i2 


(l+p)p  (a^; 
2  P  o 


If  we  let 


/  K  l+cr\p 


l+(7 
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(l+pVp(<^o) 
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1  ’jk 
k=l 


both  of  which  are  ’’tilted”  probabilities  and  let 
K  1+a 

k?,  Vjk  "'"•jk 

- vr, — 

k=l 


then,  we  have 


log^  e  P 


J  ,  J  .  2 

E  V”/  -(  E  ”1'")) 

]  =  1  ^=1  ' 

+  Z  E  E  ‘5jki"^k) 

Lj=i  k=i  ^j=i  k=i  f 


J  K 


which  is  positive  because  both  terms  are  variances, 
ing  in  increasing  a^. 


Therefore,  u  (a  )/a  is  monotone  increas- 
’^p  o  o 

Q.  E.  D. 
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